The existing ripeness detection algorithm for strawberries suffers from low detection accuracy and high detection error rate. Considering these problems, we propose an improvement method based on YOLOv5, which firstly reconfigures the feature extraction network by replacing ordinary convolution with hybrid depth deformable convolution. In the second step, a double cooperative attention mechanism is constructed to improve the representation of strawberry features in complex environments. Finally, cross-scale feature fusion is proposed to fully integrate the multiscale target features. The method was tested on the strawberry ripeness dataset, the mAP reached 95.6 percentage points, the FPS reached 76, and the model size was 7.44M. The mAP and FPS are 8.4 and 1.3 percentage points higher respectively than the baseline network. The model size is reduced by 6.28M. This method is superior to many state-of-the-art algorithms in terms of detection speed and accuracy. The system can accurately identify the ripeness of strawberries in complex environments, which could provide technical support for automated picking robots.