Abstract:
To balance the speed and accuracy in semantic segmentation of the urban street images for autonomous driving, we proposed an improved U-Net network. Firstly, to improve the model representation capability, our improved U-Net network structure was designed as three parts, shallow layer, intermediate layer and deep layer. Different attention mechanisms were used according to their feature extraction characteristics. Specifically, a spatial attention module was used in the shallow network, a dual attention module was used in the intermediate layer network and a channel attention module was used in the deep network. At the same time, the traditional convolution was replaced by depthwise separable convolution in above three parts, which can largely reduce the number of network parameters, and improve the network operation speed greatly. The experimental results on three datasets show that our improved U-Net semantic segmentation model for street images can get better results in both segmentation accuracy and speed. The average mean intersection over union (MIoU) is 68.8%, which is increased by 9.2% and the computation speed is about 38 ms/frame. We can process 27 frames images for segmentation per second, which meets the real-time process and accuracy requirements for semantic segmentation of urban street images.