Enhancing Ensemble Model Accuracy and Interpretability: A Framework Integrating Rough Set Theory and Recursive Feature Elimination for Feature Selection and Interpretability with Association Rule Analysis in Ensemble Models.

Isaac Kega; Lawrence  Nderu; Ronald Mwangi; Dennis Njagi

doi:10.22541/au.170993286.68523250/v1

loading page

Enhancing Ensemble Model Accuracy and Interpretability: A Framework Integrating Rough Set Theory and Recursive Feature Elimination for Feature Selection and Interpretability with Association Rule Analysis in Ensemble Models.

Isaac Kega,
Lawrence Nderu,
Ronald Mwangi,
Dennis Njagi

Abstract

In machine learning, feature selection is of utmost importance for augmenting the predictive capabilities of ensemble models. This paper presents an innovative hybrid framework for selecting features in ensemble models, which combines Rough Set Theory (RST) with Recursive Feature Elimination (RFE), complemented by Association Rule Mining, to enhance interpretability. The suggested method considerably improves ensemble models’ prognostic accuracy and comprehensibility, particularly Random Forests and Gradient Boosting Machines. The framework starts with the RFE process, meticulously eliminating less influential features, and then applies RST to refine the feature set further by eliminating redundancies. This two-phase approach results in a feature set that is optimally reduced yet highly influential. By implementing this hybrid method on ensemble models, significant improvements in predictive accuracy are demonstrated across three diverse datasets: cancer, Pima Indians Diabetes, and a weather dataset from Underground. The accomplished accuracies for these datasets were 0.9663, 0.8793, and 0.8427, respectively, highlighting the proposed approach’s effectiveness. This article also proposes the incorporation of association rule mining to analyze the outcomes of the models. This technique improves the understandability of the models, offering more profound insights into the connections and patterns, thus tackling the difficulty of interpretability in intricate ensemble models. Our empirical analysis confirms the effectiveness of the proposed hybrid feature selection model, representing a significant advancement in the field. The integration of RFE and RST optimizes the feature selection process and bridges the gap in interpretability, offering robust solutions for applications where accuracy and understanding of model decisions are crucial.