Machine learning algorithm to predict allergy: first results of a
nationwide Allergen Chip Challenge
Abstract
Background: Serum allergen-specific immunoglobulins E (IgE)
play a key role in allergy diagnosis along with clinical history and
physical examination. Nowadays, allergen multiplex assays allow complex
polyallergic cases to be solved as they assess up to 300
allergen-specific IgE. Recently, machine learning has emerged as a
trending tool in medicine. The aim was to build a nationwide,
open-access database to create an algorithm that could predict allergy
diagnosis, severity, category (airborne, food, venom) and culprit
allergens. Methods: A retrospective national database was
created by the French Society of Allergology in collaboration with
AllergoBioNet and the Health Data Hub. Collected data were de-identified
patient profiles with five demographic items, twenty clinical items and
sIgE results of one allergen multiplex assay. An international
crowdsourced machine learning competition was hosted by the Trustii.io
platform. Criteria for algorithm evaluation were the F-score (a measure
of a model’s accuracy on a dataset) and external validation on patient
profiles outside the database (80%-20%, respectively).
Results: Data were collected from 4271 patient files. Two
hundred and ninety-two data scientists competed with 3135 algorithms.
The best F-scores were comprised between 78% and 80%. Models
associated with the highest F-scores used gradient boosting classifiers
such as LightGBM, CatBoost, XGBoost adapted for tabular datasets with
categorical features. Conclusions: We report here the first
artificial intelligence models applied to allergen multiplex arrays
interpretation in a nationwide real-world database built to be open
access. With F-scores close to 80%, the French Allergen Chip Challenge
paves the way for a diagnostic prediction tool for practicing
allergists.