Abstract:
Aiming at the accuracy challenge in obstacle detection for autonomous driving, we propose an improved you only look once X-S (YOLOX-S) model based on swin transformer-tiny YOLOX-S (ST-YOLOX-S) for obstacle detection, which could detect multiple targets, including people, cars, bicycles, motorcycles, and buses. Our method mainly comprises two aspects as follows. To improve the capability of local feature extraction and then obtain more accurate detection for obstacles under real-world vehicle conditions, the existing backbone of YOLOX-S is replaced with the swin transformer-tiny backbone. We reduced the number of channels between the swin transformer and path aggregation-feature pyramid network (PA-FPN) from [96, 192, 384, 768] to [192, 384, 768], to decrease the computational cost and then make the swin transformer-tiny more compatible with the PA-FPN. Conclusively, on the popular COCO dataset, the proposed ST-YOLOX-S improves the detection mean average precision (mAP) by 6.1% when compared with YOLOX-S. Among the five types of obstacles that appear in simulated actual vehicle conditions, our ST-YOLOX-S also achieves superior performance compared to YOLOX-S. Furthermore, our method achieves significant performance over the YOLOv3 on obstacle detection, which shows the effectiveness of the proposed algorithm.