This paper presents our participation in the Capsule Vision 2024 Challenge, where we leverage FasterViT, a high-speed variant of the Vision Transformer (ViT), to address the problem of efficient image classification under resource constraints. By adapting FasterViT with targeted data augmentation, fine-tuning, and optimization strategies, we achieved improved accuracy and computational efficiency over standard baselines. Our approach demonstrates the potential of efficient transformers in real-world visual recognition applications.