The rapid advancement of Artificial Intelligence (AI) models such as Generative Adversarial Networks (GANs) has been a great success in the field of image synthesis and creation. Artificially generated GAN-based images are widely spread over the Internet along with the development in generation of natural and photorealistic images. While this could lead to better digital media and content, it also poses a risk to security, legitimacy, and authenticity. The advancement of AI-generated images, particularly those that are produced by Generative Adversarial Networks (GANs), has created a rising concern about the potential misuse of these images in spreading misinformation and creating deepfakes. Detecting such fake or AI-generated images has become an important challenge in maintaining the integrity of digital media. In this research, we have explored the application of the Vision Transformer (ViTs) model for detecting AI-generated images, leveraging the Kaggle dataset-a balanced collection of real and AI-generated images. The Vision Transformer is recognized for its innovative method of treating images as sequences of patches and excels at identifying long-range dependencies and complex patterns within images. That makes it exceptionally well-suited for this task of detecting fake images. We have fine-tuned the ViT model on the dataset, performing data augmentation techniques on it and leveraging pretrained weights to boost the model's performance. The findings thus obtained demonstrate that the ViT model attains a high level of accuracy in differentiating between real and AI-generated images, outperforming traditional CNN-based approaches. Beyond performance evaluation, we also conducted an ablation study to examine the impact of various components of the ViT model, including the number of attention heads, patch size, the impact of data augmentation, and the depth of layers. The results obtained in this study indicate that the ViT model not only excels in accuracy but also provides a robust framework for detecting AI-generated images across diverse scenarios. Our study shows the strength of transformer based models in addressing the increasing challenge of AI-generated image detection, laying a foundation for future research in this critical area. This experiment highlights that when the ViT model is fine tuned with optimal data augmentation techniques, it gains state of the art performance in AI-generated image detection, emphasizing its potential for real-world applications.