loading page

Prediction of Lung Cancer Metastasis Using Machine Learning Models Based on Clinical Data
  • +4
  • Chao Du,
  • Qi Liu,
  • Yuanyuan Guo,
  • Jun Gong,
  • Ling Yan,
  • Zhijie Li,
  • Changchun Niu
Chao Du
Chongqing Medical University
Author Profile
Qi Liu
Chongqing University Fuling Hospital
Author Profile
Yuanyuan Guo
Chongqing General Hospital
Author Profile
Jun Gong
University-Town Hospital of Chongqing Medical University
Author Profile
Ling Yan
Chongqing General Hospital
Author Profile
Zhijie Li
Chongqing General Hospital
Author Profile
Changchun Niu
Chongqing Medical University

Corresponding Author:[email protected]

Author Profile

Abstract

Background:Clinical laboratory data, indicative of tumor cell growth and metabolic activity, warrants investigation for its potential in predicting lung cancer metastasis. Aims: The purpose is to develop a predictive model for regional lymph node involvement and skip metastasis in lung cancer using machine learning methods and integrating clinical laboratory information and patient characteristics. Methods: Data from lung cancer patients at Chongqing University Fuling Hospital between January 2020 and December 2022 were analyzed retrospectively. Patients were divided into N (regional lymph node involvement prediction) and M (skip metastasis prediction) groups based on TNM staging criteria. Prognostic factors were determined through univariate analysis and LASSO regression, and machine learning algorithms were used to develop predictive models. Results: Out of a total of 1629 cases analyzed, 861 were in the N group and 519 were in the M group. Univariate analysis revealed 40 parameters that were significantly different between the two groups (p < 0.05) and 27 parameters, respectively. LASSO regression identified 13 characteristic factors for the N group and 12 for the M group. In the N group, these factors included tumor size, prothrombin time (PT), mean platelet volume, fibrinogen, platelet count, procalcitonin, carbohydrate antigen 15-3 (CA 15-3), carcinoembryonic antigen (CEA), adenosine deaminase, red blood cell distribution width, thrombin time, smoking and drinking history. In the M group, the factors were cytokeratin 19 fragment, tumor size, CEA, CA 15-3, squamous cell carcinoma-related antigen, alkaline phosphatase, fibrinogen, hemoglobin, calcium, albumin, PT, and absolute monocyte value. The test set results indicated that the Logistic regression model was optimal for both groups, achieving AUCs of 0.888 and 0.875, respectively. Conclusion: This study demonstrates the potential of using machine learning algorithms alongside clinical characteristics and laboratory data to predict regional lymph node involvement and skip metastasis in lung cancer.
Submission Checks Completed
Assigned to Editor
Reviewer(s) Assigned
Reviewer(s) Assigned