Abstract:This paper proposes an end-to-end defect detection algorithm model called YOLOv5-CJ to address the issues of missed detections and low accuracy in visual steel surface defect detection. By incorporating the C3_MSBlock module, which is designed based on Res2Net and contains multi-scale information, into the backbone network, the model can represent multi-scale features at a finer granularity level and increase the network's receptive field. In the detection head, the DyHead attention mechanism is introduced, which integrates scale awareness, spatial awareness, and task awareness, to help the model better cope with complex industrial scenes with large background interference, significant defect scale variations, and easily confused defect categories. Soft NMS is employed to enhance the recognition capability in overlapping regions. Experimental results demonstrate that the proposed YOLOv5-CJ model outperforms YOLOv5s, achieving an average precision (mAP0.5 and mAP0.5:O.95) improvement of 1.9% and 7.1% on the NEU-DET dataset, and 5.3% and 4.3% on the GC10-DET dataset, respectively, validating the feasibility and effectiveness of the proposed model.