Optimizing ROC Curve for Ensemble Models through Pareto Front Analysis
of the Decision Space
Abstract
The ROC, Receiver Operating Characteristic, curve is commonly used to
evaluate the performance of machine learning ensemble classification
models that combine multiple classifiers and use a voting procedure to
determine the final classification. Although they have many parameters,
their ROC curves usually only explore the voting threshold, limiting
their potential for improvement. In this paper we propose a new method,
ROC mapping, to improve the performance of the model by re-defining the
ROC curve as the Pareto front of a multi-objective optimization problem
that maps the multidimensional space of all parameters of the ensemble
classifier (Decision space), into the Objective space defined in the
two-dimensional unitary interval. We use an algorithm based on NSGA-II
to explore the Decision space and validate the proposal on two different
classification problems: (1) predicting car insurance claims of a highly
imbalanced dataset (Insurance dataset), and (2) predicting obesity risk
with a balanced clinical dataset (GenObIA dataset). We compare our
method with alternative ensemble optimization methods using the visual
assessment, Area Under the Curve, and the Youden Index as figures of
merit. In the Insurance dataset, our method shows an average improvement
of 46 .4% in Area Under the Curve, and 26 .1% in the
Youden Index, both calculated relative to the maximum achievable
improvement. In the GenObIA dataset, we achieve an average increase of
29 .7% in Area Under the Curve, and 11 .9% in the Youden
Index, again based on the maximum possible improvement. The ROC mapping
approach provides a comprehensive and adaptable ROC curve, demonstrating
its effectiveness in improving classification performance across
different applications.