Objective: To develop a multimodal artificial intelligence model based on medical health records, imaging information, and laboratory test indicators to assist in the diagnosis of Kawasaki disease. Methods: This study was conducted using the Kawasaki disease database from our hospital, retrospectively collecting medical information from a total of 500 children (both with Kawasaki disease and healthy). We designed a Chinese-BERT-Base module, a ResNet module, and a fully connected layer module to process medical records, image data, and laboratory test results, respectively. Subsequently, we utilized early fusion to concatenate the vectors and input them into a classifier for outputting classification results. We designed unimodal models and traditional machine learning models for comparative evaluation to assess the effectiveness of our model. We analyzed the attention of each module to the raw data to evaluate the interpretability of the model. Additionally, we collected data from another 100 children from a peer hospital as an external validation group. In a double-blind scenario, three senior doctors and the model performed classifications simultaneously for a human-machine comparison experiment. Results: The multimodal model developed in this study demonstrated significant improvements in accuracy (93%) and specificity (93%) compared to unimodal models and traditional machine learning models. Interpretability analysis showed that the attention of each module in the multimodal model largely aligned with the thought processes of human doctors. The human-machine comparison experiment indicated that the model’s classification performance (87%) still had a notable gap compared to that of the human doctors (98%). Conclusion: The multimodal model developed in this study effectively utilized clinical data, achieving good diagnostic performance and providing a relatively reliable tool to assist clinicians in diagnosing Kawasaki disease.