Enhancing Ensemble Model Accuracy and Interpretability: A Framework
Integrating Rough Set Theory and Recursive Feature Elimination for
Feature Selection and Interpretability with Association Rule Analysis in
Ensemble Models.
Abstract
In machine learning, feature selection is of utmost importance for
augmenting the predictive capabilities of ensemble models. This paper
presents an innovative hybrid framework for selecting features in
ensemble models, which combines Rough Set Theory (RST) with Recursive
Feature Elimination (RFE), complemented by Association Rule Mining, to
enhance interpretability. The suggested method considerably improves
ensemble models’ prognostic accuracy and comprehensibility, particularly
Random Forests and Gradient Boosting Machines. The framework starts with
the RFE process, meticulously eliminating less influential features, and
then applies RST to refine the feature set further by eliminating
redundancies. This two-phase approach results in a feature set that is
optimally reduced yet highly influential. By implementing this hybrid
method on ensemble models, significant improvements in predictive
accuracy are demonstrated across three diverse datasets: cancer, Pima
Indians Diabetes, and a weather dataset from Underground. The
accomplished accuracies for these datasets were 0.9663, 0.8793, and
0.8427, respectively, highlighting the proposed approach’s
effectiveness. This article also proposes the incorporation of
association rule mining to analyze the outcomes of the models. This
technique improves the understandability of the models, offering more
profound insights into the connections and patterns, thus tackling the
difficulty of interpretability in intricate ensemble models. Our
empirical analysis confirms the effectiveness of the proposed hybrid
feature selection model, representing a significant advancement in the
field. The integration of RFE and RST optimizes the feature selection
process and bridges the gap in interpretability, offering robust
solutions for applications where accuracy and understanding of model
decisions are crucial.