Ye Mao

and 7 more

Data-driven approaches for molecular diagnostics are emerging as an alternative to perform an accurate and inexpensive multi-pathogen detection. A novel technique called Amplification Curve Analysis (ACA) has been recently developed by coupling machine learning and real-time Polymerase Chain Reaction (qPCR) to enable the simultaneous detection of multiple targets in a single reaction well. However, target classification purely relying on the amplification curve shapes currently faces several challenges, such as distribution discrepancies between different data sources of synthetic DNA and clinical samples (i.e., training vs testing). Optimisation of computational models is required to achieve higher performance of ACA classification in multiplex qPCR through the reduction of those discrepancies. Here, we proposed a novel transformer-based conditional domain adversarial network (T-CDAN) to eliminate data distribution differences between the source domain (synthetic DNA data) and the target domain (clinical isolate data). The labelled training data from the source domain and unlabelled testing data from the target domain are fed into the T-CDAN, which learns both domains’ information simultaneously. After mapping the inputs into a domain-irrelevant space, T-CDAN removes the feature distribution differences and provides a clearer decision boundary for the classifier, resulting in a more accurate pathogen identifi- cation. Evaluation of 198 clinical isolates containing three types of carbapenem-resistant genes (blaNDM , blaIMP and blaOXA-48 ) illustrates a curve-level accuracy of 93.1% and a sample- level accuracy of 97.0% using T-CDAN, showing an accuracy improvement of 20.9% and 4.9% respectively, compared with previous methods. This research emphasises the importance of deep domain adaptation to enable high-level multiplexing in a single qPCR reaction, providing a solid approach to extend qPCR instruments’ capabilities without hardware modification in real- world clinical applications.