Over the past decade, significant advancements in computer vision have been made, primarily driven by deep learning-based algorithms for object detection. However, these models often require large amounts of labeled data, leading to performance degradation when applied to tasks with limited datasets, particularly in scenarios involving moving objects. For instance, real-time detection and detection of humans in agricultural settings pose challenges that demand sophisticated vision algorithms. To address this issue, we propose SB-YOLO-V8, an optimized YOLO-based Convolutional Neural Network (CNN) designed specifically for real-time human detection in citrus farms. The proposed model is trained using images and videos of human workers captured by autonomous farm equipment. The preprocessing stage involves employing data augmentation techniques and Synthetic Minority Over-sampling Technique (SMOTE) to enhance object detection performance and prevent overfitting. SB-YOLO-V8 incorporates Binary ALO optimization for improved feature extraction, enabling high-quality data extraction for classification purposes. The architecture comprises both the YOLO-based CNN and an aggregator module for classification and feedback, respectively. Evaluation metrics, including frame per second (FPS), model performance, and efficiency, demonstrate the proposed model outperforms variances of YOLO such as YOLO-V8, YOLO-V7, YOLO-V6, YOLO-V4 and YOLO-V3 with an average FPS of 13.63 and a precision of 91%. In effect, the proposed SB-YOLO-V8 presents an efficient solution for real time human detection in challenging visual scenarios.