Chest radiograph data
All procedures in this study were approved by the Institutional Review Board of University of California, San Francisco Medical Center, California, USA and performed in accordance with relevant guidelines and regulations. The institutional review board of University of California, San Francisco Medical Center, California, USA waived the need of informed consent for the retrospective use of the CXR data. All participants were non-obstetric adult patients who presented to the emergency department (ED) between July 1, 2012 and November 30, 2016 and received a chest radiograph at the ED or at an outpatient facility on the day of ED presentation. 34,743 frontal chest radiographs were initially identified belonging to 30,823 patients paired with corresponding patient’s age, sex, ZIP code, and cost at UCSF Medical Center within the consequent 1, 3, or 5 year(s). At our institution, frontal chest radiographs for adult patients are typically obtained at 100–120 kVp with automatic exposure control. Source to detector distance is set at 72 inches unless modified for special reasons. Lateral chest radiographs were excluded. More than 90% of the images were acquired on GE and Philips hardware.
Healthcare spending data
Healthcare spending data was obtained from the cost accounting unit of the institution’s hospital financial department. Total healthcare spending was based on the sum of direct and indirect expenses attributed to patients’ hospital stay, pharmacy, laboratory, imaging, surgeries, and medical consultations over the time period during which they were included in the study. As an outcome, we selected total healthcare expenditure over the subsequent 1, 3, and 5 years.
12,869 (37.0%) of 34,743 chest radiographs were excluded during data processing due to some missing patient information (Suppl. Fig. S1). 11,857 (92.1%) of the excluded radiographs did not have information available for their healthcare spending. The rest 1012 (7.9%) of the excluded chest radiographs consisted of 8 with no associated sex, 128 with no associated ZIP codes and 876 whose ZIP code could not be matched to the median income.
Exploratory data analysis
Pairwise chi-squared test was performed between the cost groups (above and below median expenditure patients) and patient demographic variables such as age groups, sex, geographic area, and race. The effect of demographic variables (sex, age, race, and median income per ZIP code area) and their second-order interaction terms on log10-transformed costs was analyzed in multi-factor ANOVA. Missing data analysis was performed to compare 1004 excluded CXRs (not including the 8 with unknown sex) and the 21,872 included CXRs. We analysed association of missingness with demographic variables (sex, geographic area, and race) as well as top-spenders and bottom-spenders using chi-squared test, and association to age using t-test (Suppl. Table S1). Among the 21,872 chest radiographs with 1 year expenditure, 9477 (43.3%) are missing expenditure amounts for 3 years or longer and 20,073 (91.8%) are missing expenditure amounts for 5 years. To investigate the contribution of survivorship bias (i.e. patients living longer may have lower medical cost because they were healthier or did not have to pay for end-of-life care), we compared median expenses for patients who dropped out versus those who remained before the 3rd and 5th year after CXR was taken.
Regression models were developed to predict healthcare expenditures, and binary classification models were developed to predict whether a participant’s healthcare expenditure was in the top 50%. Both regression and classification models were developed in four versions: (T for “Tabular data”) baseline model that relies only on patient sex, age, and ZIP code median income as input, (X for “X-ray”) ResNet11 with only CXR images as input, (TX1) separately trained T and X model combined at final stage, and (TX2) modified ResNet trained end-to-end with CXR images, age, sex, and per-ZIP code median income as input. The baseline (T) regression and classification baseline models were gradient boosting regressor12,13 and an AdaBoost Classifier14 respectively implemented in the Python scikit-learn package with default parameters. The regression and classification CXR-only models (X) were a modified ResNet18 model and a modified ResNet50 model respectively. For combined model TX1 (Suppl. Fig. S2), the raw softmax score or final (regression) output from model (X) were concatenated to categorical data and then processed with model (T) approach to arrive at the output. For combined model TX2 (Suppl. Fig. S3), the neural network architectures from model (X) were modified after the final convolutional layers to allow the concatenation of the categorical data into the neural network model in an end-to-end fashion. See Supplemental Figs. S2, S3 and eAppendix for implementation details.
All versions of ResNet were initialized with weights pre-trained on ImageNet15,16. For all models, hyperparameters such as learning rate, linear layer dimension, number of linear layers, and others were empirically optimized via random search17. After hyperparameter tuning and training, the models were evaluated against the pre-split test set18. The training, validation, and test set were split by patient identification numbers to ensure that no two CXR from same patient is represented across multiple datasets. The outputs of the classification model were evaluated using the area under the receiver operator characteristic curve (ROC-AUC), and F1 score. The ROC-AUC of classification models were compared pairwise using DeLong method19. The outputs of the regression model were measured using Pearson’s R, and Spearman ρ. Confidence intervals (95%) were computed for all statistics. Each training and evaluation were performed for 1 year (21,872 CXR), 3 years (12,395 CXR) and 5 years of expenditure (1779 CXR), respectively. Since 1-year expenditure data was the most complete, all following analysis should be assumed to be based on 1-year expenditure, unless mentioned otherwise.
For error analysis, we interrogated whether the absolute difference between the true cost value and the predicted cost value is correlated with any of the patient demographic factors. The linear model used percentage differences (|true cost − predicted cost|/true cost) as the dependent variable and patient sex, race, age, median income, and overall true cost as the covariates.