KIDNEYS, URETERS, BLADDER, RETROPERITONEUM

Check for updates

Deep learning approach for differentiating indeterminate adrenal masses using CT imaging

Yashbir Singh1 . Zachary S. Kelm1 . Shahriar Faghani1 . Dana Erickson2 . Tal Yalon3 . Irina Bancos2 . Bradley J. Erickson1

Received: 8 March 2023 / Revised: 12 June 2023 / Accepted: 13 June 2023 / Published online: 27 June 2023 @ The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023

Abstract

Purpose Distinguishing stage 1-2 adrenocortical carcinoma (ACC) and large, lipid poor adrenal adenoma (LPAA) via imaging is challenging due to overlapping imaging characteristics. This study investigated the ability of deep learning to distinguish ACC and LPAA on single time-point CT images.

Methods Retrospective cohort study from 1994 to 2022. Imaging studies of patients with adrenal masses who had avail- able adequate CT studies and histology as the reference standard by method of adrenal biopsy and/or adrenalectomy were included as well as four patients with LPAA determined by stability or regression on follow-up imaging. Forty-eight (48) subjects with pathology-proven, stage 1-2 ACC and 43 subjects with adrenal adenoma >3 cm in size demonstrating a mean non-contrast CT attenuation > 20 Hounsfield Units centrally were included. We used annotated single time-point contrast- enhanced CT images of these adrenal masses as input to a 3D Densenet121 model for classifying as ACC or LPAA with five-fold cross-validation. For each fold, two checkpoints were reported, highest accuracy with highest sensitivity (accuracy focused) and highest sensitivity with the highest accuracy (sensitivity focused).

Results We trained a deep learning model (3D Densenet121) to predict ACC versus LPAA. The sensitivity-focused model achieved mean accuracy: 87.2% and mean sensitivity: 100%. The accuracy-focused model achieved mean accuracy: 91% and mean sensitivity: 96%.

Conclusion Deep learning demonstrates promising results distinguishing between ACC and large LPAA using single time- point CT images. Before being widely adopted in clinical practice, multicentric and external validation are needed.

Keywords Adrenocortical carcinoma · Lipid poor adrenal adenoma . Computed tomography . Deep learning

Introduction

Adrenal masses are incidentally discovered on CT imaging (> 1 cm in size) in approximately 2-7% [1, 2] of patients, and increases with age (up to 10% in adults age > 70 years) [1, 3]. The vast majority of adrenal masses represent benign adenomas, whether discovered incidentally or in the setting of known extra-adrenal primary malignancy. Prior research

Bradley J. Erickson

☒ bje@mayo.edu

1 Department of Radiology, Mayo Clinic, Rochester, MN, USA

2 Division of Endocrinology, Metabolism and Nutrition, Mayo Clinic, Rochester, MN, USA

3 Department of General Surgery, Mayo Clinic, La Crosse, WI, USA

has demonstrated characteristic imaging findings of non- contrast-enhanced CT attenuation and contrast washout assessment to have high sensitivity and specificity in distin- guishing between benign adrenal adenomas and hypovascu- lar metastases in the adrenal gland when they exhibit benign imaging characteristics of homogeneity and mean radioden- sity < 10 HU. However, a substantial proportion (> 20-30%) of eventually proven benign masses do not meet these imag- ing criteria [4-7]. In fact some of them can be heterogenous, without low CT density components and characteristics which are difficult to distinguish from adrenocortical carci- noma (ACC). This primary malignancy often demonstrates imaging findings of various size from 3 to 25 cm (mean 9 cm), imaging revealing areas of internal necrosis and/or hemorrhage, heterogeneous enhancement, calcification [8, 9], and high pre-contrast density on CT. ACCs in adults also can present with minimal or no symptoms of hormonal

excess [10], limiting detection and characterization through laboratory tests [11]. While imaging evidence of local inva- sion and/or metastases aids the diagnosis of stage 3 and 4 ACCs, distinguishing between stage 1 and 2 ACCs and the subset of large heterogeneous, benign adrenal adenomas has been challenging in clinical practice [12]. In fact, in a recent study of 705 adrenal tumors > 4 cm, 31% were benign adenomas and surgical procedures were performed in 55% of thèse even without presence of hormonal excess, consistent with others finding similar results [13]. Confident identification of ACC through adrenal biopsy has limitations and concern of needle track seeding has been raised [14], and therefore adrenalectomy becomes necessary. The thera- peutic approach might be altered by additional information supporting a benign etiology for the mass. Additionally, both adrenal mass biopsy and adrenalectomy carry procedure- related risks and are associated with significant financial expense. In lieu of the complexity of these factors, in clini- cal practice it can be challenging to determine which large adrenal lesions should be followed with imaging, biopsied, or surgically excised.

Convolutional neural networks (CNNs) are a state-of- the-art Deep Learning (DL) technique for medical image analysis. These networks “learn” to predict a diagnosis or state by finding low-level image features (e.g., edges and curves) and then combine these to higher level features (e.g., structures) [15, 16]. A CNN that can process 3D input data are called a 3D CNN. Although it has the same structure as a 2D CNN, the 3D convolutions require more memory and processing time than a 2D CNN [17]. To our knowledge, no study has yet reported the application of an imaging DL model to distinguish benign and malignant large indetermi- nate adrenocortical tumors. We aimed to develop a CT-based 3D CNN for differentiating benign versus malignant large adrenocortical tumors. We hypothesized that a CNN can be used to accurately differentiate between stage 1-2 ACC and large, lipid poor adrenal adenoma (LPAA) using CT images.

Materials and methods

Patients and data architecture

The Mayo Clinic Institutional Review Board approved this study protocol. Patients with pathology-proven ACCs were identified using adrenal lesion biopsy and/or excision data at our institution between 1994 and 2022. The pathology reports and imaging studies were reviewed to limit the cohort to patients with stage 1-2 ACC at the time of initial diagnosis. 48 of these patients were found to have adequate portal venous phase contrast-enhanced CT imaging, with 13 tumors (27%) being classified as stage 1 ACC and 35 tumors (73%) being stage 2 ACC. Additionally, patients with

adrenal adenoma tumors > 3 cm in size with both noncon- trast and portal venous phase contrast-enhanced CT imag- ing were identified. The mean radiodensity of the adenomas were measured by radiologists experienced in adrenal imag- ing. Ultimately 43 patients with adrenal adenomas > 3 cm in maximum dimension and > 20 Hounsfield unit (HU) density (on non-contrast imaging) were included in the study. For the 43 identified adenomas, histologic proof was available for 39 of the patients via biopsy and/or excision. The remain- ing 4 adenomas were determined to be benign through imaging or clinical follow-up, with one tumor significantly decreasing in size by 10.5 months, two demonstrating stabil- ity after 3.5 or 5.5 years, and one patient demonstrating no evidence of disease after 18 years of clinical follow-up. Each of the tumors was segmented using a tight lasso on each slice of the tumor by a board-certified radiologist.

For the 43 patients with LPAA (23 male, 20 female), the median age was 64.3 year (range 31.5-87.6 year) with median maximum diameter of the mass of 40.5 mm (range 30-100 mm). Pathology was determined from biopsy speci- mens for 13 of the LPAA cases, 26 patients had surgical excision specimens, and the remaining 4 were determined to have benign adenomas through imaging or clinical follow- up. Among the 13 LPAA biopsy cases, 4 of the patients went on to have their adrenal gland resected. For the 48 patients with ACC (23 male, 25 female), the median age was 53.4 year (range 22.1-77.7 year), and the median maximum diameter of the mass was 76.5 mm (range 29-240 mm). All of the patients with ACC underwent resection of their adrenal tumor (Table 1).

From a hormonal standpoint, 17 (39.5%) of LPAA vs 9 (18%) of ACC exhibited mild autonomous glucocorticoid overproduction (defined as A.M. serum cortisol > 1.8 mcg/ dl following 1 mg dexamethasone suppression without clini- cal signs of Cushing syndrome and normal 24 h urinary cortisol when measured), 1 patient with LPAA had Cushing syndrome (clinical signs and elevated 24-h urinary corti- sol) vs 9 (18%) with ACC, while 14 (32.5%) LPAA were nonfunctioning masses vs 15 (31%) ACC masses. Three (3) patients with ACC had pure androgen overproduction while additional 3 ACC had combination of androgen overproduc- tion and mild autonomous glucocorticoid overproduction. Preoperative hormonal information was lacking in 9 (18.7%) patients with ACC and 11 (25.5%) with LPAA.

Model, initialization, and training

Using the Scikit-learn toolkit, we split our dataset into five folds stratified by class (ACC and LPAA) at the patient level (to prevent information leakage) [19]. We used five- fold cross-validation to evaluate the model’s robustness to perturbations in the data [19]. CT images were resampled to 1 × 1 × 1 mm voxel dimensions using trilinear interpolation.

Table1 Demographic information of the data
Parameter	Statistic	LPAA	ACC
Sex-Male	Total	23 patients	23 patients
Sex-Female	Total	20 patients	25 patients
Age	Mean +/- SD	60.3 +/- 15.0 year	52.2 +/- 15.4 year
	Median	64.3 year	53.4 year
	25-75%	50.2-72.9 year	40.1-65.8 year
Lesion max diameter	Mean +/- SD	4.65 +/- 1.63 cm	8.66 +/- 4.89 cm
	Median	4.05 cm	7.65 cm
	25-75%	3.44-5.55 cm	4.85-11.90 cm

Table 2 The data augmentation techniques and their parameters
Random Flip	Probability = 0.5
Random translation	Translate range = (15, 15, 10)
Random scaling	Scale range = (0.05, 0.05, 0.05)
Random rotation	Rotation range = (pi/8, pi/8, pi/8)
Random gaussian noise addition	(mean = 0.0, std = 0.2)

Then, guided by the lesion segmentation, a 3D bounding box was used to crop the tumors out of CT images. Voxels were normalized with zero mean and unit standard deviation, and ultimately images were zero-padded to 210 x 240 × 280 voxels (the size of the largest lesion). To decrease the risk of overfitting and make the model more generalizable, the following data augmentation techniques were used (Table 2).

To distinguish between ACC and LPAA images, we uti- lized a 3D-DenseNet-121 [18] classifier from the MONAI package [20, 21]. DenseNet is a CNN that uses convolu- tions to extract meaningful information while connecting each layer to every other layer in a feed-forward manner.

Before each convolution layer in DenseNet, a 1 × 1 convo- lution is added as a bottleneck layer to reduce the number of feature mappings. The bottleneck structure and dimen- sion reduction in transition layers are contained in each dense block, increasing parameter efficiency and decreas- ing model complexity [18], thus reducing the likelihood of overfitting (Fig. 1).

With 91 patients in our dataset, the AdamW opti- mizer was employed with a batch size of 4, and a cosine annealing learning rate scheduler was utilized with the initial learning rate of 1 × 10-3 for 100 epochs. We used weighted cross-entropy as the loss function with 1 and 10 as weights for benign and malignant classes, respectively, to increase the sensitivity of our model. For each fold, two checkpoints were reported, highest accuracy with high- est sensitivity (accuracy-focused) and highest sensitivity with the highest accuracy (sensitivity-focused). We used a cluster of four NVIDIA A100 GPUs to train our model. Given that there was no major class imbalance between the ACC and LPAA cases, we did not use any sampling method to solve the imbalance. For each fold we computed the accuracy and sensitivity.

Fig.1 Brief schema of our developed pipeline for distin- guishing stage 1-2 adrenocorti- cal carcinoma (ACC) and large, lipid poor adrenal adenoma (LPAA)

Cropped 3D boundary box

Dense Net 3D-121

Input image

Radiologist annotation

5-fold cross-validation

Cropping

ACC vs LPAA

Creating boundary box

Image annotation

Bounding box around the annotation

Results

In this study, CNN was employed to predict ACC versus LPAA. Two sets of results were presented to evaluate the performance of the CNN model: sensitivity-focused and accuracy-focused analyses.

In the sensitivity-focused analysis, the CNN achieved a sensitivity of 100% for detecting ACC. Sensitivity repre- sents the ability of the model to correctly identify positive instances (in this case, ACC cases) out of all the actual posi- tive instances. This high sensitivity indicates that the CNN is effective in correctly classifying ACC cases.

The mean accuracy across the folds (cross-validation folds) in the sensitivity-focused analysis was 87%, with a standard deviation of 8.13. Accuracy measures the overall correctness of the model’s predictions. An accuracy of 87% suggests that CNN is successful in accurately classifying ACC and LPAA cases, although some misclassifications may occur.

In the accuracy-focused analysis, the CNN was optimized for achieving the highest accuracy. The mean accuracy in this case was 91%, with a standard deviation of 0. The opti- mization for accuracy resulted in a slightly higher mean accuracy compared to the sensitivity-focused analysis. How- ever, it’s worth noting that the model’s sensitivity dropped slightly to a mean of 96%. This indicates that the model may sacrifice a small sensitivity to achieve higher overall accuracy. Overall, these results indicate that the CNN model shows promising performance in predicting ACC versus LPAA. It exhibits high sensitivity, ensuring that most ACC cases are correctly identified. The accuracy, though slightly lower, remains relatively high. These findings suggest that the CNN has the potential to be a valuable tool for accurate classification in this specific medical domain (Table 3).

Discussion

For this study, we developed a deep learning model using a CNN to differentiate LPAA and Stage 1-2 ACC, targeting a challenging problem that clinicians face. When adrenal adenomas become large, they can often demonstrate a het- erogeneous appearance on imaging studies due to degenera- tion, hemorrhage, and/or calcification which make them dif- ficult to separate from ACC. While it might seem prudent to always resect such masses, this incurs potentially avoidable cost and morbidity. A tool that could confidently exclude malignancy in these cases could be valuable for patient man- agement. As such, the value of a high sensitivity prediction algorithm is high. The current tool using deep learning in our study was able to achieve 100% sensitivity with a mean accuracy across folds of 87.2% and the accuracy-focused model achieved a mean accuracy of 91% with a mean sen- sitivity of 96%.

In this study, we specifically included clinically chal- lenging cases and excluded those which were more obvi- ously benign (such as lipid-rich adenomas) or more obvi- ously malignant (stage 3-4 ACC). This study supports the hypothesis that there are likely additional imaging features that could be used for non-invasively distinguishing between benign and malignant entities. Given the gravity of diag- nosis of ACC, we emphasized maximizing sensitivity to minimize the risk of underdiagnosis of a potentially life- threatening condition. The current model considered only the CT images. It is possible that the performance could be improved by adding clinical information such as the age and gender. Of course, when more variables are added, it increases the chance for spurious associations that give the appearance of good performance that do not generalize well to a larger population.

Table 3 Values for metrics obtained during a 5-fold stratified cross-validation evaluation of the 3D Densenet121 classifier. For part (i), we optimized for sensitivity, as detection of ACC is thought to be clinically more important. For part (ii), we optimized for accuracy.
Folds	Accuracy	Sensitivity	Specificity	Positive predictive value (PPV)	Negative predic- tive value (NPV)
(i) Sensitivity-focused results
1	0.909	1.0	0.8	0.857	1.0
2	0.72	1.0	0.5	0.625	1.0
3	0.909	1.0	0.833	0.833	1.0
4	0.909	1.0	0.833	0.833	1.0
5	0.909	1.0	0.833	0.833	1.0
(ii) Accuracy-focused results
1	0.909	1.0	0.8	0.857	1.0
2	0.909	0.80	1.0	1.0	0.857
3	0.909	1.0	0.833	0.833	1.0
4	0.909	1.0	0.833	0.833	1.0
5	0.909	1.0	0.833	0.833	1.0

Other methods of texture assessment such as radiomics have been studied for added value in distinguishing adre- nal tumors. A systematic review of diagnostic accuracy by Crimi, et. al., of CT texture reported on 9 studies with 20-356 (average 126) patients. The texture analysis software included various methods as TexRAD, PyRadiomics, and explicit texture calculations. In 6 studies, histopathologi- cal examination was the gold standard, and many of tumors included were functional tumors (pheochromocytomas), metastatic tumors to adrenal glands, or not specified. While half of the studies included the whole adrenal tumor for analysis, half focused on a single region of interest. Various features of texture were reported to be significantly differ- ent between benign and malignant lesions (metastases and ACCs). The ROC curves were calculated for the majority of the studies with pooled median AUC of 0.85 (0.67-0.89) to discriminate between benign and malignant lesions. Two of the studies evaluated a population defined like our study (benign adenomas vs ACC) and included either 19 or 54 patients with histologically proven disease. One applied manual segmentation while the other ROI was obtained [22, 23].

While radiomics makes pre-specified texture measure- ments and those measurements are provided to a traditional machine learning method (e.g., support vector machine or decision tree), CNNs adjust kernel weights to learn the tex- tures and also learn the weights. As such, CNNs have the potential to learn textures that are not prospectively defined, but they also require more examples to learn the textures. In our case, we started with weights learned from ImageNet (https://www.image-net.org/). Although these weights were derived from a collection of photographs and not specifically optimized for medical images, they were likely beneficial regardless.

Eighteen of the LPAA cases had available washout imag- ing. Only 4 of these 18 cases demonstrated the characteristic absolute washout of >60% considered typical of adenomas. 15 of the ACC cases had available washout imaging. While the majority of these cases demonstrated a low washout %, there were still 4 of the 15 ACC cases which demonstrated absolute washout of > 60%. This indicates that quantitative washout imaging is not sufficiently reliable in differentiating large LPAA from ACC. This is compatible with recent pub- lications reporting CT washout analysis to have a suboptimal performance in a systematic review and meta-analysis of accuracy of imaging in diagnosing malignancy, with lim- ited data available, and sensitivity of only 16% (95% CI, 3-40) and specificity of 86% (95% CI, 64-97) in diagnosing malignancy [24, 25], Given scarce data, reported subopti- mal accuracy, and associated cost, CT washout analysis was not recommended as a second-line testing of adrenal mass by the most recent guidelines for practicing endocrinolo- gists [15] and therefore although reported it is not used that

frequently by practitioners when they consider recommenda- tions regarding surgical options.

Biochemical markers in setting of large adrenal masses, such as urine steroid metabolome using mass spectrometry, have been recently studied in a large group of cases in the Eurine study to facilitate personalized approach to patient management [26]. A certain moderate and high malig- nant steroid fingerprint was able to predict ACC in 98% of patients; however the PPV in ‘moderate risk fingerprint’ for ACC was only 17%. A much better predictive value was found when certain imaging phenotype and steroid metabo- lomics were combined. We also note that the steroid metabo- lomics biochemical test is not widely available, is expensive, and takes weeks to obtain results.

Limitations

A significant limitation to this study is the relatively small number of cases, but this study was designed to investigate a clinically relevant diagnostic dilemma involving a rare entity. CT postcontrast washout characteristics of masses were not assessed in this study because this would have limited the number of available cases even further. Clinical information could have been integrated [27] but this would also have increased the feature to case ratio, increasing the chance of overfitting. The images were obtained using CT machines from different manufacturers using varying acquisition protocols over a large time period. However, the images used were limited to the portal venous phase, rather than angiographic or delayed phases, which tends to cause less contrast timing variability. Additionally, the variability of the acquisition protocols within the study group, including variation in slice thickness, may have a positive effect on the generalizability of the classification algorithm. Finally, many of the biopsied and/or excised benign adenomas were found in patients with a separate primary malignancy. This likely introduced an element of selection bias as the heightened suspicion for a metastatic lesion may have influenced the decision to pursue histologic confirmation over observation.

Conclusion

The DL model we developed can aid in the non-invasive dif- ferentiation of adrenocortical carcinoma and large adrenal adenomas with indeterminate imaging characteristics. This can aid in selecting tumors likely to represent malignant dis- ease requiring meticulous surgical preparation versus those that might represent a benign process and thus followed by serial imaging. To further improve the model’s classifica- tion performance, we are looking into incorporating demo- graphic data, multiphasic imaging, and non-imaging clinical data with consideration of a larger multiinstitutional dataset.

Supplementary Information The online version contains supplemen- tary material available at https://doi.org/10.1007/s00261-023-03988-w.

Acknowledgements The authors are thankful to Mayo AI lab, Radiol- ogy, Mayo clinic, Rochester, MN, USA.

Funding None

Declarations

Conflict of interest The authors declare that they have no conflict of interest.

References

1. Sherlock, M., Scarsbrook, A., Abbas, A., Fraser, S., Limumporn- petch, P., Dineen, R., & Stewart, P. M. (2020). Adrenal inciden- taloma. Endocrine Reviews, 41(6), 775-820.

2. Reimondo, G., Castellano, E., Grosso, M., Priotto, R., Puglisi, S., Pia, A., … & Terzolo, M. (2020). Adrenal incidentalomas are tied to increased risk of diabetes: findings from a prospective study. The Journal of Clinical Endocrinology & Metabolism, 105(4), e973-e981.

3. Bovio, S., Cataldi, A., Reimondo, G., Sperone, P., Novello, S., Ber- ruti, A., … & Terzolo, M. (2006). Prevalence of adrenal inciden- taloma in a contemporary computerized tomography series. Journal of endocrinological investigation, 29, 298-302.

4. Boland, G. W., Lee, M., Gazelle, G. S., Halpern, E. F., McNicholas, M. M., & Mueller, P. R. (1998). Characterization of adrenal masses using unenhanced CT: an analysis of the CT literature. AJR. Ameri- can journal of roentgenology, 171(1), 201-204.

5. Boland, G. W., Blake, M. A., Hahn, P. F., & Mayo-Smith, W. W. (2008). Incidental adrenal lesions: principles, techniques, and algo- rithms for imaging characterization. Radiology, 249(3), 756-775.

6. Seo, J. M., Park, B. K., Park, S. Y., & Kim, C. K. (2014). Characteri- zation of lipid-poor adrenal adenoma: chemical-shift MRI and wash- out CT. American Journal of Roentgenology, 202(5), 1043-1050.

7. Bancos, I., Taylor, A. E., Chortis, V., Sitch, A. J., Lang, K., Prete, A., … & Arlt, W. (2020). Urine metabolomic phenotyping for detection of adrenocortical carcinoma: still a long way to go-Authors’ reply. The Lancet Diabetes & Endocrinology, 8(11), 877-878.

8. Fishman, E. K., Deutch, B. M., Hartman, D. S., Goldman, S. M., Zerhouni, E. A., & Siegelman, S. S. (1987). Primary adrenocorti- cal carcinoma: CT evaluation with clinical correlation. American Journal of Roentgenology, 148(3), 531-535.

9. Bharwani, N., Rockall, A. G., Sahdev, A., Gueorguiev, M., Drake, W., Grossman, A. B., & Reznek, R. H. (2011). Adrenocortical carci- noma: the range of appearances on CT and MRI. American journal of roentgenology, 196(6), W706-W714.

10. Vanbrabant, T., Fassnacht, M., Assie, G., & Dekkers, O. M. (2018). Influence of hormonal functional status on survival in adrenocortical carcinoma: systematic review and meta-analysis. European journal of endocrinology, 179(6), 429-436.

11. Nader, S., Hickey, R. C., Sellin, R. V., & Samaan, N. A. (1983). Adrenal cortical carcinoma a study of 77 cases. Cancer, 52(4), 707-711.

12. Newhouse, J. H., Heffess, C. S., Wagner, B. J., Imray, T. J., Adair, C. F., & Davidson, A. J. (1999). Large degenerated adrenal adenomas: radiologic-pathologic correlation. Radiology, 210(2), 385-391.

13. Fassnacht, M., Arlt, W., Bancos, I., Dralle, H., Newell-Price, J., Sahdev, A., … & Dekkers, O. M. (2016). Management of adrenal incidentalomas: European society of endocrinology clinical prac- tice guideline in collaboration with the European network for the study of adrenal tumors. European journal of endocrinology, 175(2), G1-G34.

14. Lau, S. K., & Weiss, L. M. (2009). The Weiss system for evaluating adrenocortical neoplasms: 25 years later. Human pathology, 40(6), 757-768.

15. Erickson, B. J., Korfiatis, P., Kline, T. L., Akkus, Z., Philbrick, K., & Weston, A. D. (2018). Deep learning in radiology: does one size fit all ?. Journal of the American College of Radiology, 15(3), 521-526.

16. Indolia, S., Goswami, A. K., Mishra, S. P., & Asopa, P. (2018). Conceptual understanding of convolutional neural network-a deep learning approach. Procedia computer science, 132, 679-688.

17. Islam, J., & Zhang, Y. (2019). Understanding 3D CNN behavior for Alzheimer’s disease diagnosis from brain PET scan. arXiv preprint arXiv:1912.04563.

18. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).

19. Rouzrokh, P., Khosravi, B., Faghani, S., Moassefi, M., Vera Garcia, D. V., Singh, Y., … & Erickson, B. J. (2022). Mitigating bias in radiology machine learning: 1. Data handling. Radiology: Artificial Intelligence, 4(5), e210290.

20. Moassefi, M., Faghani, S., Conte, G. M., Kowalchuk, R. O., Vahdati, S., Crompton, D. J., … & Erickson, B. J. (2022). A deep learning model for discriminating true progression from pseudoprogres- sion in glioblastoma patients. Journal of neuro-oncology, 159(2), 447-455.

21. The MONAI Consortium (2020) Project MONAI. https://zenodo. org/record/4323059

22. Torresan, F., Crimì, F., Ceccato, F., Zavan, F., Barbot, M., Lacog- nata, C., … & Iacobone, M. (2021). Radiomics: a new tool to dif- ferentiate adrenocortical adenoma from carcinoma. BJS open, 5(1), zraa061.

23. Elmohr, M. M., Fuentes, D., Habra, M. A., Bhosale, P. R., Qayyum, A. A., Gates, E., … & Elsayes, K. M. (2019). Machine learning- based texture analysis for differentiation of large adrenal cortical tumours on CT. Clinical radiology, 74(10), 818-e1.

24. Bancos, I., & Prete, A. (2021). Approach to the patient with adrenal incidentaloma. The Journal of Clinical Endocrinology & Metabo- lism, 106(11), 3331-3353.

25. Dinnes, J., Bancos, I., Ferrante di Ruffano, L., Chortis, V., Daven- port, C., Bayliss, S., … & Arlt, W. (2016). Management of endocrine disease: imaging for the diagnosis of malignancy in incidentally discovered adrenal masses: a systematic review and meta-analysis. European journal of endocrinology, 175(2), R51-R64.

26. Bancos, I., Taylor, A. E., Chortis, V., Sitch, A. J., Jenkinson, C., Davidge-Pitts, C. J., … & Young Jr, W. F. (2020). Urine steroid metabolomics for the differential diagnosis of adrenal incidentalo- mas in the EURINE-ACT study: a prospective test validation study. The lancet Diabetes & endocrinology, 8(9), 773-781.

27. Espinasse, M., Pitre-Champagnat, S., Charmettant, B., Bidault, F., Volk, A., Balleyguier, C., … & Caramella, C. (2020). CT texture analysis challenges: influence of acquisition and reconstruction parameters: a comprehensive review. Diagnostics, 10(5), 258.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Quartz 4

Explorer

37369921