loading page

Optimizing ROC Curve for Ensemble Models through Pareto Front Analysis of the Decision Space
  • +2
  • Alberto Gutierrez-Gallego,
  • Oscar Garnica,
  • Daniel Parra,
  • J. Manuel Velasco,
  • J. Ignacio Hidalgo
Alberto Gutierrez-Gallego
Universidad Complutense de Madrid Departamento de Arquitectura de Computadores y Automatica

Corresponding Author:[email protected]

Author Profile
Oscar Garnica
Universidad Complutense de Madrid Departamento de Arquitectura de Computadores y Automatica
Author Profile
Daniel Parra
Universidad Complutense de Madrid Departamento de Arquitectura de Computadores y Automatica
Author Profile
J. Manuel Velasco
Universidad Complutense de Madrid Departamento de Arquitectura de Computadores y Automatica
Author Profile
J. Ignacio Hidalgo
Universidad Complutense de Madrid Departamento de Arquitectura de Computadores y Automatica
Author Profile

Abstract

The ROC, Receiver Operating Characteristic, curve is commonly used to evaluate the performance of machine learning ensemble classification models that combine multiple classifiers and use a voting procedure to determine the final classification. Although they have many parameters, their ROC curves usually only explore the voting threshold, limiting their potential for improvement. In this paper we propose a new method, ROC mapping, to improve the performance of the model by re-defining the ROC curve as the Pareto front of a multi-objective optimization problem that maps the multidimensional space of all parameters of the ensemble classifier (Decision space), into the Objective space defined in the two-dimensional unitary interval. We use an algorithm based on NSGA-II to explore the Decision space and validate the proposal on two different classification problems: (1) predicting car insurance claims of a highly imbalanced dataset (Insurance dataset), and (2) predicting obesity risk with a balanced clinical dataset (GenObIA dataset). We compare our method with alternative ensemble optimization methods using the visual assessment, Area Under the Curve, and the Youden Index as figures of merit. In the Insurance dataset, our method shows an average improvement of 46 .4% in Area Under the Curve, and 26 .1% in the Youden Index, both calculated relative to the maximum achievable improvement. In the GenObIA dataset, we achieve an average increase of 29 .7% in Area Under the Curve, and 11 .9% in the Youden Index, again based on the maximum possible improvement. The ROC mapping approach provides a comprehensive and adaptable ROC curve, demonstrating its effectiveness in improving classification performance across different applications.