Although several machine learning driven solutions are deemed to be effective at detecting data breaches, the recent proliferation in data breach incidents resulting from cyber attacks demands an updated, thorough analysis of machine learning (ML) based data breach countermeasures to identify research gaps and guide future studies. In view of this, this study employs a systematic approach and draws insight from 81 research articles to classify machine learning based data breach countermeasures using six criteria namely learning tasks, learning classifiers, proactive learning strategies, feature engineering methods and multimodal approaches. In classifying the studies, we: (a) propose a taxonomy of feature extraction and representation to classify studies using ten sub-criteria, (b) identify proactive learning techniques to categorise studies using four sub-criteria including self labelling, data augmentation, automated feature extraction and re-training, (c) classify multimodal machine learning approaches used in the studies into three fusion sub-criteria: namely early fusion, intermediate fusion and late fusion. To aid the literature identification, we analyse forty recent incidents and obtain prevalent cyber attack vectors of data breaches, which we present as the general workflow for data breaches due to cyber attacks. Finally, we highlight the research issues associated with existing ML-based data breach countermeasures and recommend future research directions.