Abstract:Remote sensing images are taken at high altitude from above, with complex spatial scenes of images and a large number of target types. The detection of image targets on large scale remote sensing images suffers from the problem of small target size and target density. This paper proposes an improved model for remote sensing image detection based on you only look once version 7 (YOLOv7). First, the small-scale detection layer is added to reacquire tracking frames to improve the network's recognition ability of small-scale targets, and then Bottleneck Transformers are fused in the backbone to make full use of the convolutional neural network (CNN)+Transformer architecture to enhance the feature extraction ability of the network. After that, the convolutional block attention module (CBAM) mechanism is added in the head to improve the model's ability of small-scale target. Finally, the non-maximum suppressed (NMS) of YOLOv7 algorithm is changed to distance intersection over union-non maximum suppression (DIOU-NMS) to improve the detection ability of overlapping targets in the network. The results show that the method in this paper can improve the detection rate of small-scale targets in remote sensing images and effectively solve the problem of high overlap and is tested on the NWPU-VHR10 and DOTA1.0 datasets, and the accuracy of the improved model is improved by 6.3% and 4.2%, respectively, compared with the standard YOLOv7 algorithm.