Machine Learning Principles Can Improve Hip Fracture Prediction
2018 Mar 09Machine Learning Principles Can Improve Hip Fracture Prediction
Abstract: Apply machine learning principles to predict
hip fractures and estimate predictor importance in
Dual-energy X-ray absorptiometry (DXA)-scanned men
and women. Dual-energy X-ray absorptiometry data from
two Danish regions between 1996 and 2006 were combined
with national Danish patient data to comprise 4722
women and 717 men with 5 years of follow-up time (original
cohort n=6606 men and women). Twenty-four statistical
models were built on 75% of data points through k-5,
5-repeat cross-validation, and then validated on the remaining
25% of data points to calculate area under the curve
(AUC) and calibrate probability estimates. The best models
were retrained with restricted predictor subsets to estimate
the best subsets. For women, bootstrap aggregated flexible
discriminant analysis (“bagFDA”) performed best with
a test AUC of 0.92 [0.89; 0.94] and well-calibrated probabilities
following Naïve Bayes adjustments. A “bagFDA”
model limited to 11 predictors (among them bone mineral
densities (BMD), biochemical glucose measurements,
general practitioner and dentist use) achieved a test AUC
of 0.91 [0.88; 0.93]. For men, eXtreme Gradient Boosting
(“_xgbTree”) performed best with a test AUC of 0.89 [0.82;_
0.95], but with poor calibration in higher probabilities. A
ten predictor subset (BMD, biochemical cholesterol and
liver function tests, penicillin use and osteoarthritis diagnoses)
achieved a test AUC of 0.86 [0.78; 0.94] using an
“_xgbTree” model. Machine learning can improve hip fracture_
prediction beyond logistic regression using ensemble
models. Compiling data from international cohorts of
longer follow-up and performing similar machine learning
procedures has the potential to further improve discrimination
and calibration.
Conclusion: We conclude that hip fracture risk can be modelled with
high discriminative performance for men (Test AUC of
0.89 [0.82; 0.95], sensitivity 100%, specificity 69% at the
Youden probability cut-off) and particularly for women
(Test AUC 0.91 [0.88; 0.94], sensitivity 88%, specificity
81% at the Youden probability cut-off) using advanced predictive
models. Ensemble models using bootstrap aggregation
and boosting performed best in both cohorts, and
probabilities can generally be calibrated well with a Naïve
Bayes approach, although poor for high probability estimates
in men. Models of 11 predictors for women and 9 for
men with combinations of DXA BMD measurements and
primary sector use achieved the highest numerical AUC
values. Further improvements in predictive capability are
likely possible with compilations of more data points and
longer observation periods. We strongly suggest the use of
machine learning principles to model hip fracture risk, and
we welcome an effort to compile existing datasets and perform
advanced predictive modelling.