Vision-language models (VLMs) have demonstrated significant potential in various domains, including natural language processing and computer vision. However, their application in medical imaging, particularly in X-ray diagnosis, remains underexplored. This review article examines the current state of VLMs in the context of X-ray diagnosis, focusing on their effectiveness, challenges, and potential for expandability. We discuss various models, including Knowledge-enhanced Auto Diagnosis (KAD) and BERTHop, and their performance on different datasets. Additionally, we explore the methods used for integrating domain-specific knowledge into these models and the implications for clinical practice. The review concludes with a discussion on future directions and the potential for VLMs to revolutionize medical imaging diagnostics.