Vision-based Transformers have emerged as a revolutionary architecture in the field of robotics, offering significant advancements in perception, decision-making, and control. This survey aims to provide a comprehensive review of the application of Vision-based Transformers in robotic systems. We explore the fundamental principles of Transformers, highlighting their ability to handle longrange dependencies and contextual understanding, which are critical for complex robotic tasks. The paper delves into various implementations of Vision-based Transformers across different robotic domains, including object detection, autonomous navigation, manipulation, and human-robot interaction. We also discuss the challenges and limitations associated with integrating Transformers into robotic systems, such as computational demands and real-time processing constraints. Furthermore, we present a comparative analysis of Vision-based Transformers with traditional convolutional neural networks and other state-of-the-art approaches, underscoring their unique advantages. Finally, the survey identifies promising research directions and potential future applications, aiming to guide and inspire further innovations in this rapidly evolving field.