Sarcomatoid renal cell carcinoma prognosis prediction based on the
machine learning algorithm
Abstract
Abstract Background There is currently no robust prognostic model for
sarcomatous renal cell carcinoma (sRCC), which could help physicians
make better decisions. Objectives To build an accurate predictive model
for patients who have sRCC by investigating the important
characteristics that influence the overall survival of patients. Design
and Methods The Surveillance, Epidemiology and Results (SEER) database
of the U.S. National Cancer Institute was used for gathering the dataset
of sRCC patients. Following data preprocessing, the data was separated
into the training set and the test set in an 8:2 ratio. Mann-Whitney U
test and Chi-square test were used to verify whether the data set was
evenly divided. Univariate Cox proportional hazard model, Kaplan-Meier
analysis and machine learning (ML) algorithm were employed to identify
the risk features on overall survival (OS). 10 reliable features were
selected to construct six ML models. Model performance, predictive
accuracy, and clinical benefits were evaluated by the receiver operating
characteristic curves (ROC), calibration plots, and decision curve
analysis (DCA) respectively. Results After data preprocessing, 692
patients with sRCC from 1975 to 2019 were included in this study. Ten
variables including stage group, T stage, M stage, age, surgery, N
stage, tumor size, chemotherapy, histological grade, and radiotherapy
were selected as reliable features for machine learning model training.
All the models show good prediction performance, among which XGBoost has
the best prediction accuracy and stability. The DCA showed that all
models except Adaboost could be used to support clinical decision-making
with the 90-day, 1-, 2-, 3- and 5-year OS model. Conclusions Six machine
learning models were developed to predict 90-day, 1-, 2-, 3- and 5-year
overall survival in patients with sRCC. Model evaluations showed that
the XGBoost model had the best predictive accuracy and clinical net
benefit. These models can help make treatment decisions for patients
with sRCC.