Detecting and segmenting cracks in infrastructure, such as roads and buildings, is crucial for safety and cost-effective maintenance. In spite of the potential of deep learning, there are challenges in achieving precise results and handling diverse crack types. With the proposed dataset and model, we aim to enhance crack detection and infrastructure maintenance. This study proposes a novel approach termed Hybrid-Segmentor, which uses a convolutional neural network path that is well-suited for extracting fine-grained local features and a transformer path to extract global features that benefit from understanding the overall structure. This hybrid method makes the model more generalizable to various shapes, surfaces, and sizes of cracks. To achieve a balanced computational cost, the study incorporates efficient self-attention in the transformer path and introduces a comparatively simpler decoder compared to the complexity of the two encoder paths. This combination strategically optimizes the extraction of global and local features while maintaining computational efficiency. The model was trained using a combined binary cross entropy and Dice loss function on a large refined dataset of 12,000 crack images generated from 13 publicly available datasets. Our studies demonstrate that the model efficiently utilizes convolutional layers and transformers to extract local and global features. Hybrid-Segmentor outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status. Finally, through careful qualitative analysis, we show that the model is capable of addressing discontinuities, detecting small non-crack regions, handling low-quality images, and detecting crack contours more accurately than existing models.