D-DeepOCSORT: Multi-Object Tracking Algorithm Based on LiDAR and Monocular Camera
Affiliation:

1.Chang'2.'3.an University;4.University of California, Berkeley

Fund Project:

The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

  • Article
  • | |
  • Metrics
  • |
  • Reference [44]
  • | |
  • Cited by
  • | |
  • Comments
    Abstract:

    Multi-object tracking is a fundamental problem in computer vision, and DeepOCSORT, due to its reliance on the 2D im-age position information of targets, tends to exhibit prediction drift. To enhance the tracking stability of DeepOCSORT, this paper proposes a novel multi-sensor data fusion-based multi-object tracking method. Specifically, we build upon the DeepOCSORT foundation and additionally integrate target velocity information directly measured by LiDAR. The intro-duction of this velocity information is conducted from three perspectives: firstly, during data association, a penalty term is constructed based on the differences in target velocities to constrain generating matches with consistent velocities; sec-ondly, using LiDAR velocity for initialization and online updating of the velocity state within the tracker, making tracking predictions more stable; thirdly, controlling the degree of dependence on velocity information by adjusting the process noise covariance matrix. Evaluation results on the KITTI dataset demonstrate that compared to the original DeepOCSORT, the proposed improved multi-source heterogeneous information fusion method significantly enhances tracking performance, with maximum improvements of 3.35, 3.26, and 3.71 on the HOTA, MOTA, and IDF1 metrics, respectively. This study provides an effective approach to building a more stable and accurate multi-object tracking sys-tem.

    Reference
    [1] G. Ciaparrone, F. L. Sánchez, S. Tabik, L. Troiano, R. Tagliaferri, F. Herrera, Deep learning in video multi-object tracking: A survey, Neurocomputing 381 (2020) 61–88.
    [2] W.Luo, J. Xing, A. Milan, X. Zhang, W. Liu, T.-K. Kim, Multiple object tracking: A literature review, Artificial intel-ligence 293 (2021) 103448.
    [3] F. Yu, W. Li, Q. Li, Y. Liu, X. Shi, J. Yan, Poi: Multiple object tracking with high performance detection and ap-pearance feature, in: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, Springer, 2016, pp. 36–42.
    [4] A.Bewley, Z. Ge, L. Ott, F. Ramos, B. Upcroft, Simple online and realtime tracking, in: 2016 IEEE international conference on image processing (ICIP), IEEE, 2016, pp. 3464–3468.
    [5] N. Wojke, A. Bewley, D. Paulus, Simple online and realtime tracking with a deep association metric, in: 2017 IEEE international conference on image processing (ICIP), IEEE, 2017, pp. 3645–3649.
    [6] E. Bochinski, V. Eiselein, T. Sikora, High-speed track-ing-by-detection without using image information, in: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, 2017, pp. 1–6.
    [7] Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, X. Wang, Bytetrack: Multi-object tracking by associating every detection box, in: European Conference on Computer Vision, Springer, 2022, pp. 1–21.
    [8] R. G. Brown, P. Y. Hwang, Introduction to random signals and applied kalman filtering: with matlab exercises and solutions, Introduction to random signals and applied Kalman filtering: with MATLAB exercises and solutions (1997).
    [9] P. Bergmann, T. Meinhardt, L. Leal-Taixe, Tracking with-out bells and whistles, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 941–951.
    [10] G. Brasó, L. Leal-Taixé, Learning a neural solver for mul-tiple object tracking, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 6247–6257.
    [11] L. Chen, H. Ai, Z. Zhuang, C. Shang, Real-time multiple people tracking with deeply learned candidate selection and person re-identification, in: 2018 IEEE international con-ference on multimedia and expo (ICME), IEEE, 2018, pp. 1–6.
    [12] P. Chu, H. Ling, Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple ob-ject tracking, in: Proceedings of the IEEE/CVF Interna-tional Conference on Computer Vision, 2019, pp. 6172–6181.
    [13] A. Hornakova, R. Henschel, B. Rosenhahn, P. Swoboda, Lifted disjoint paths with application in multiple object tracking, in: International Conference on Machine Learning, PMLR, 2020, pp. 4364–4375.
    [14] J. Xu, Y. Cao, Z. Zhang, H. Hu, Spatial-temporal relation networks for multi-object tracking, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3988–3998.
    [15] J. Zhu, H. Yang, N. Liu, M. Kim, W. Zhang, M.-H. Yang, Online multi-object tracking with dual matching attention networks, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 366–382.
    [16] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems 28 (2015).
    [17] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, Cen-ternet: Keypoint triplets for object detection, in: Proceed-ings of the IEEE/CVF international conference on com-puter vision, 2019, pp. 6569–6578.
    [18] C.-Y. Wang, I.-H. Yeh, H.-Y. M. Liao, Yolov9: Learning what you want to learn using programmable gradient in-formation, arXiv preprint arXiv:2402.13616 (2024).
    [19] A. Bochkovskiy, C.-Y. Wang, H.-Y. M. Liao, Yolov4: Optimal speed and accuracy of object detection, arXiv pre-print arXiv:2004.10934 (2020).
    [20] Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, Yolox: Exceeding yolo series in 2021, arXiv preprint arXiv:2107.08430 (2021).
    [21] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable detr: Deformable transformers for end-to-end object detec-tion, arXiv preprint arXiv:2010.04159 (2020).
    [22] G. Maggiolino, A. Ahmad, J. Cao, K. Kitani, Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification, arXiv preprint arXiv:2302.11813 (2023).
    [23] Sun, P., Cao, J., Jiang, Y., Zhang, R., Xie, E., Yuan, Z., ... & Luo, P. (2020). Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460.
    [24] Lan, J. P., Cheng, Z. Q., He, J. Y., Li, C., Luo, B., Bao, X., ... & Xie, X. (2023, June). Procontext: Exploring pro-gres-sive context transformer for tracking. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
    [25] Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., & Wei, Y. (2022, October). Motr: End-to-end multiple-object tracking with transformer. In European Conference on Computer Vision (pp. 659-675). Cham: Springer Nature Switzerland.
    [26] Chu, P., Wang, J., You, Q., Ling, H., & Liu, Z. (2023). Transmot: Spatial-temporal graph transformer for multiple object tracking. In Proceedings of the IEEE/CVF Winter Conference on applications of computer vision (pp. 4870-4880).
    [27] Li, Z., Wang, W., Li, H., Xie, E., Sima, C., Lu, T., ... & Dai, J. (2022, October). Bevformer: Learning bird’s-eye-view rep-resentation from multi-camera images via spatiotem-poral transformers. In European conference on computer vision (pp. 1-18). Cham: Springer Nature Switzerland.
    [28] Xu, X., Dong, S., Xu, T., Ding, L., Wang, J., Jiang, P., ... & Li, J. (2023). Fusionrcnn: Lidar-camera fusion for two-stage 3d object detection. Remote Sensing, 15(7), 1839.
    [29] Li, Y., Yu, A. W., Meng, T., Caine, B., Ngiam, J., Peng, D., ... & Tan, M. (2022). Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pat-tern recog-nition (pp. 17182-17191).
    [30] Zhang, Y., Wang, C., Wang, X., Zeng, W., & Liu, W. (2022, October). Robust multi-object tracking by marginal inference. In European Conference on Computer Vision (pp. 22-40). Cham: Springer Nature Switzerland.
    [31] Zhang, C., Zhang, C., Guo, Y., Chen, L., & Happold, M. (2023). Motiontrack: end-to-end transformer-based mul-ti-object tracking with lidar-camera fusion. In Pro-ceedings of the IEEE/CVF Conference on Computer Vi-sion and Pattern Recognition (pp. 151-160).
    [32] Wang, Z., Zheng, L., Liu, Y., Li, Y., & Wang, S. (2020, August). Towards real-time multi-object tracking. In Eu-ropean con-ference on computer vision (pp. 107-122). Cham: Springer International Publishing.
    [33] Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., & Yang, M. H. (2018). Online multi-object tracking with dual matching at-tention networks. In Proceedings of the Euro-pean conference on computer vision (ECCV) (pp. 366-382).
    [34] Tian, S., Duan, M., Deng, J., Luo, H., & Hu, Y. (2024). MF-Net : A Multimodal Fusion Model for Fast Mul-ti-object Tracking. IEEE Transactions on Vehicular Tech-nology.
    [35] Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016, September). Simple online and realtime tracking. In 2016 IEEE international conference on image processing (ICIP) (pp. 3464-3468). IEEE.
    [36] Wojke, N., Bewley, A., & Paulus, D. (2017, September). Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP) (pp. 3645-3649). IEEE.
    [37] Kuhn, H. W. (1955). The Hungarian method for the as-sign-ment problem. Naval research logistics quarterly, 2(1‐2), 83-97.
    [38] Munkres, J. (1957). Algorithms for the assignment and transportation problems. Journal of the society for indus-trial and applied mathematics, 5(1), 32-38.
    [39] Bourgeois, F., & Lassalle, J. C. (1971). An extension of the Munkres algorithm for the assignment problem to rec-tangular matrices. Communications of the ACM, 14(12), 802-804.
    [40] Jonker, R., & Volgenant, T. (1988). A shortest augmenting path algorithm for dense and sparse linear assignment problems. In DGOR/NSOR: Papers of the 16th Annual Meeting of DGOR in Cooperation with NSOR/Vortr?ge der 16. Jahrestagung der DGOR zusammen mit der NSOR (pp. 622-622). Springer Berlin Heidelberg.
    [41] Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell, T., & Yu, F. (2021). Quasi-dense similarity learning for multiple object tracking. In Proceedings of the IEEE/CVF confer-ence on computer vision and pattern recognition (pp. 164-173).
    [42] Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., ... & Wang, X. (2022, October). Bytetrack: Multi-object tracking by associating every detection box. In European conference on computer vision (pp. 1-21). Cham: Springer Nature Switzer-land.
    [43] Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., & Yang, M. H. (2018). Online multi-object tracking with dual matching at-tention networks. In Proceedings of the Euro-pean conference on computer vision (ECCV) (pp. 366-382).
    [44] Wang, G., Peng, C., Gu, Y., Zhang, J., & Wang, H. (2023). Interactive multi-scale fusion of 2D and 3D features for mul-ti-object vehicle tracking. IEEE Transactions on Intel-ligent Transportation Systems, 24(10), 10618-10627.
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation
Share
Article Metrics
  • Abstract:16
  • PDF: 0
  • HTML: 0
  • Cited by: 0
History
  • Received:September 03,2024
  • Revised:October 23,2024
  • Adopted:December 11,2024
Article QR Code