Objective: The rapid growth of the COVID-19 pandemic caused healthcare systems worldwide to be overwhelmed with an unprecedented influx of patients. This problem highlighted the urgent need for new diagnostic solutions that are capable of handling this increased stream of patients. This study explores the application of Vision Transformers for diagnosing COVID-19 from chest X-rays, demonstrating their advantages over traditional Convolutional Neural Networks. Methods and procedures: Artificial Intelligence, particularly Vision Transformers, has shown significant promise in medical diagnostics by providing higher accuracy and efficiency in interpreting complex medical images, such as chest X-rays. A combination of the COVOICE-19 and COVIDx datasets is used to train several different Vision Transformers. The common issue of data scarcity is addressed by making publicly available the COVOICE-19 dataset at github.com/RICS-Datalab/COVOICE-19-data, containing both chest X-rays and voice samples of infected and healthy patients. Results: The results show accuracies of up to 93.78% when trying to classify chest X-rays between negative and positive for COVID-19. Furthermore, Explainable Artificial Intelligence techniques were used to interpret the decision-making processes of the used models, ensuring transparency and trustworthiness in automated diagnostics. The results of these techniques were validated by a radiology expert to confirm their practical relevance in aiding the diagnosis. Conclusion: The obtained results show the potential of Vision Transformers in enhancing the diagnostic processes for COVID-19 through chest X-rays opposed to other Deep Learning models.