Prediction of unsuccessful endometrial ablation: Random Forest vs
Logistic Regression
Abstract
Objective: To develop a prediction model to predict surgical
re-intervention within two years after endometrial ablation (EA) by
using a random forest technique (RF). The performance of the developed
prediction model was then compared with a previously published
multivariate logistic regression model (LR) (1). Design: Retrospective
cohort study. Setting: Data from two non-university teaching hospitals
in the Netherlands were used. Population: 446 pre-menopausal women who
have had an EA for heavy menstrual bleeding between January 2004 and
April 2013. Methods: The RF model was trained in MATLAB (2018b) using
the TreeBagger function in the Statistics and Machine Learning Toolbox.
Main outcome measures: The performance of the two models was compared
using the area under the Receiving Operating Characteristic (ROC) curve
(AUROC). Measurements and Main Results: The LR model had an AUC of 0.71
(95% CI 0.64-0.78). The RF model had an AUC of 0.63 (95% CI
0.54-0.71). and an AUC of 0.65 (95% CI 0.56-0.74) after hyperparameter
optimization. Conclusion: The RF model is not superior compared to the
LR model in predicting the outcome of surgical re-intervention within
two years after EA. Machine learning techniques are gaining popularity
in development of clinical prediction tools, but they are not
necessarily superior to traditional statistical logistic regression
techniques. The performance of a model is influenced by the sample size
and the number of features, hyperparameter tuning and the linearity of
associations. Both techniques should be considered when developing a
prediction model.