This research investigates the application of machine learning for diagnosing COVID-19 from chest X-rays. We analyze various popular architectures, including efficient neural networks (EfficientNet), multiscale vision transformers (MViT), efficient vision transformers (EfficientViT), and vision transformers (ViT), on a dataset categorized into COVID, lung opacity, normal, and viral pneumonia. While multiscale models demonstrate a tendency to overfit, our proposed fine-tuning ViT model achieves significant accuracy, reaching 95.79% in four-class classification, 99.57% in a clinically relevant three-class grouping, and similarly high performance in binary classification. Validation through quantitative metrics and visualization solidifies the model's effectiveness. Comparative analysis showcases the superiority of our approach. Overall, these findings showcase the potential of ViT for accurate COVID-19 diagnosis, contributing to the advancement of medical image analysis.