Abstract:To address challenges in pedestrian detection within dense scenes, including high crowd density, severe occlusion, and overlapping individuals, an improved YOLO-based algorithm is proposed. First, deformable convolutions are employed to replace standard convolutions, enhancing the model's adaptability to variations in shape and appear-ance under occlusions. Second, a multi-dimensional attention module is designed to emphasize critical local regions and extract more precise feature information. Lastly, a diagonal difference intersection-over-union loss function is introduced, which incorporates a measure of the Euclidean distance difference between the main diagonal points of predicted and ground truth bounding boxes, thereby enhancing detection accuracy and regression performance. Experimental results demonstrate that the enhanced algorithm achieves an mAP50 of 75.1% on the public dense pedestrian dataset WiderPerson, an improvement of 1.8% over the original YOLOv5 model, showcasing superior detection performance.