Abstract:
Aiming at the existing semantic segmentation process due to the loss of pixel features and the complexity of calculating too many parameters, which leads to unsatisfactory segmentation results and too long time, this paper proposes a lightweight semantic segmentation algorithm based on the fusion of multiple modules. The algorithm is based on the pyramid scene parsing network (PSPNet). Firstly, MobileNetV2 network is chosen as the feature extraction network to construct the lightweight network structure. In the training of the network, a freeze and thaw method is used, and the Focal Loss function is added to balance the proportion of positive and negative samples. After that, spatial and channel reconstruction convolution (SCConv) is introduced in the pyramid pooling module to reduce the segmentation task. The computational cost due to redundant feature extraction is reduced. Finally, the coordinate attention (CA) and the efficient channel attention network (ECA-Net) are incorporated to make the multi-modules integrate with each other to enhance the salient features and improve the segmentation accuracy. Through the ablation and comparison experiments, the average pixel accuracy on PASCAL VOC 2012 dataset reaches 85.23%, the computation amount is reduced by 45.79%, and the training speed is improved by 68.69%. The average pixel accuracy on Cityscapes dataset reaches 86.75%, the average intersection and merger ratio reaches 73.86%, and the interaction of multiple modules with correlation performance makes the algorithm improved and optimized, effectively solving the problems of low segmentation accuracy and slow training speed in the algorithm, which has a significant accuracy advantage in the lightweight model, and can generally improve the efficiency of image semantic segmentation.