HCM-AF-Risk Model to Identify Cases and Predictors of Atrial
Fibrillation in Hypertrophic Cardiomyopathy
Abstract
Background AF in HCM is associated with high stroke risk despite low
CHA2DS2-VASc scores. Hence, there is need to understand AF
pathophysiology and predict AF in HCM. We develop/apply a data-driven,
machine learning-based method to identify AF cases and clinical features
associated with AF in HCM, using electronic health record (EHR) data.
Methods Patients with documented paroxysmal/ persistent/permanent AF
(n=191) were considered AF cases, and the remaining patients in sinus
rhythm (n=640) were tagged as No-AF. We evaluated 93 clinical variables;
the most informative variables useful for distinguishing AF from No-AF
cases were selected based on the 2-sample t-test and information gain
criterion. Results We identified 18 highly informative clinical
variables: 11 are positively associated (e.g. LA-diameter, LV-diastolic
dysfunction, LV-LGE), and 7 are negatively correlated (e.g. several
exercise parameters) with AF in HCM. Next, patient records were
represented via these 18 variables. Data imbalance resulting from the
relatively low number of AF cases was addressed via a combination of
over- and under-sampling strategies. We trained and tested multiple
classifiers under this sampling approach, showing effective
classification. Specifically, an ensemble of logistic regression and
naïve Bayes classifiers, trained based on the 18 variables and corrected
for data imbalance, proved most effective for separating AF from No-AF
cases (sensitivity=0.74, specificity=0.72, C-index=0.80). Conclusions
Our model (HCM-AF-Risk Model), the first machine learning-based method
for identification of AF cases in HCM, demonstrates good performance,
and suggests that AF is associated with a more severe cardiac
HCM-phenotype.