Detecting maturity of fruits and vegetables, especially avocados, is a critical task in modern agriculture and supply chain management. Moreover, the accurate assessment of maturity can improve the harvesting time and ensure consistent quality for consumers through the supply chain process. A key approach to achieving this is the non-destructive estimation of produce quality. Vision-Based Tactile Sensing (VBTS) technologies, which mimic human tactile perception, offer a novel approach to address this challenge. This paper focuses on the use of two notable VBTS technologies, GelSight and Facebook's DIGIT sensor. Using these technologies, we developed two novel datasets that assess the avocado maturity using the transformer models, marking a novel contribution in this area. We adapted several transformer architectures to the task, conducting experiments on both image classification and regression to estimate avocado firmness. Among the variants tested, the PoolFormer displayed notable results with accuracy of 92% in detecting avocado maturity level when used with tactile data. The datasets and code used in this study will be shared at this URL. Index terms-Vision-based tactile sensors (VBTS), Vision Transformer (ViT), self-attention block, maturity classification.