Osama Maqbool - 21DOCS Test Area

Synthetic data is an indispensable supplement to the difficult-to-acquire real data in order to meet the substantial demand by machine learning based systems. Data playing the key role in machine learning models, its objective and maintainable quality metrics are vital for quality assurance of the whole system. This paper introduces a systematic and domain-neutral methodology based on formalized scenario variation and experimental digital twins for the generation of synthetic data. The methodology uses human-readable scenarios and semantically meaningful parameter variations to describe possible entities, actions and events to be simulated, whereas experimental digital twins bring the scenarios to life by the integration of various domains of a system such as mechanics, sensors, actuators and communication under one platform that can be simulated as a whole. The scenario description and digital twin simulation is carried out iteratively to derive the optimal distribution of synthetic data. Thus scenarios and experimentable digital twins can together serve as mediums to systematically cover diverse application scenarios, test dangerous situations and find faults within a system.