Recent advances in telecommunication and machine learning (ML) have allowed for new smart and autonomous vehicle applications to improve road safety, environmental conditions, and traffic management through Vehicle to Vehicle (V2V) or Vehicle to Infrastructure (V2I) communication. However, with the rise of advanced cyber-attacks, the authenticity of a message guarantees its source but not its correctness. To mitigate the new sophisticated attacks, new Misbehavior Detection Systems (MDS), that use machine learning algorithms to detect misbehaving vehicles, have been proposed. This work provides first a comprehensive review of recent developments in ML-based MDS technology within a Vehicular Ad-Hoc Network (VANET) context, covering data collection, feature selection, model training, model evaluation and deployment. We survey useful public datasets and summarize recent studies. We report useful pieces of information for every work. In particular, we highlight the considered dataset for ML training, list the selected ML models, indicate the feature selection and dimensionality reduction techniques, recapitulate the main results, report the performance metrics and mention the deployment guidelines when applicable. Then, we compare the surveyed studies discussing not only the strength points but also their limitations. One of the key observations from the surveyed works is the absence of a quantitative analysis of the proposed models’ execution time, which is a crucial performance metric considering the limited on-board and edge computing resources. To develop a feasible ML-based MDS for V2X communication, it is essential to address this issue and propose a deployment strategy that optimizes the allocated resources for this technology. However, achieving this remains a challenge. Moreover, in view of the fact that data generation and analysis are critical phases in this technology. Also, using simulation has many advantages over the real data collection, we provide a tutorial on how realizing a useful dataset collection with popular open-source tools while considering exemplar types of attacks. Last, we demonstrate through the tutorial the use of the collected dataset for ML-based MDS model selection and training. An open source github repository is provided for regenerating the whole explained scenarios and modify according to the given research issue.