Healthcare Cost Patterns and Prediction: Investigating Personal Datasets
using Data Analytics
Abstract
The present study introduces a health insurance prediction system that
leverages machine learning methodologies. In contemporary times, there
has been a notable increase in endeavors focused on tackling this matter
since the significance of health insurance as a research topic has
markedly escalated following the pandemic. The dataset employed in this
research comprises 1338 observations 7 columns and corresponds to
individual medical expenditures in the United States, available at the
Kaggle platform. The dataset encompasses a variety of variables utilized
in the prediction of insurance prices, including age, gender, BMI,
smoking status, and number of children. The researchers used machine
learning models, including neural networks, XAI, and auto modeling, to
determine the correlation between pricing and the attributes. The
training process involved partitioning the dataset into an 80-20 ratio
for training and evaluation. Consequently, the system achieved an
accuracy rate of 97% by Gradient Boosting, but we corrected it to 92%
by Gradient Boosting Regressor by encoding and hyper-tuning. Also, among
predictive machine learning models, Random Forest had the best accuracy
i.e., of 83.44%.Â