Multi-level temporal feature fusion with feature exchange strategy for multiple object tracking
CSTR:
Author:
Affiliation:

1.College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China;2.School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321019, China;3.Department of Digital Media Technology, Hangzhou Dianzi University, Hangzhou 310023, China;4. College of Intelligent Manufacturing, Wenzhou Polytechnic, Wenzhou 325035, China

  • Article
  • | |
  • Metrics
  • |
  • Reference [12]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    With the deepening of neural network research, object detection has been developed rapidly in recent years, and video object detection methods have gradually attracted the attention of scholars, especially frameworks including multiple object tracking and detection. Most current works prefer to build the paradigm for multiple object tracking and detection by multi-task learning. Different with others, a multi-level temporal feature fusion structure is proposed in this paper to improve the performance of framework by utilizing the constraint of video temporal consistency. For training the temporal network end-to-end, a feature exchange training strategy is put forward for training the temporal feature fusion structure efficiently. The proposed method is tested on several acknowledged benchmarks, and encouraging results are obtained compared with the famous joint detection and tracking framework. The ablation experiment answers the problem of a good position for temporal feature fusion.

    Reference
    [1] FEICHTENHOFER C, PINZ A, ZISSERMAN A. Detect to track and track to detect[C]//Proceedings of the IEEE International Conference on Computer Vision, October 22-29, 2017, Venice, Italy. New York:IEEE, 2017:3038-3046.
    [2] ZHANG Y, WANG C, WANG X, et al. FairMOT:on the fairness of detection and re-identification in multiple object tracking[J]. International journal of computer vision, 2021, 129:3069-3087.
    [3] PENG J L, WANG Q, WANG X. Chained-tracker:chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking[C]//16th European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Heidelberg:Springer, 2020:145-161.
    [4] ZHOU X, KOLTUN V, KR?HENBüHL P. Tracking objects as points[C]//16th European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Heidelberg:Springer, 2020:474-490.
    [5] ZHANG Y, WANG C, WANG X, et al. Bytetrack:multi-object tracking by associating every detection box[C]//17th European Conference on Computer Vision, October 24-28, 2022, Tel Aviv, Israel. Heidelberg:Springer, 2022:1-21.
    [6] CHEN X, PENG H, WANG D, et al. SeqTrack:sequence to sequence learning for visual object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-22, 2023, Vancouver, Canada. New York:IEEE, 2023:14572-14581.
    [7] LIU M, ZHU M. Mobile video object detection with temporally-aware feature maps[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18-22, 2018, Salt Lake City, UT, USA. New York:IEEE, 2018:5686-5695.
    [8] BERTASIUS G, TORRESANI L, SHI J. Object detection in video with spatiotemporal sampling networks[C]//15th European Conference on Computer Vision, September 8-14, 2018, Munich, Germany. Heidelberg:Springer, 2018:331-346.
    [9] GUO C, ZHENG N, TAN Y, et al. Progressive sparse local attention for video object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, Korea. New York:IEEE, 2019:3909-3918.
    [10] TANG P, WANG C, WANG X, et al. Object detection in videos by high quality object linking[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 42(5):1272-1278.
    [11] XU Y, BAN Y, DELORME G, et al. TransCenter:transformers with dense representations for multiple-object tracking[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(6):7820-7835.
    [12] YU F, WANG D, SHELHAMER E, et al. Deep layer aggregation[C]//Proceedings of the IEEE Conference on
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

GE Yisu, YE Wenjie, ZHANG Guodao, LIN Mengying. Multi-level temporal feature fusion with feature exchange strategy for multiple object tracking[J]. Optoelectronics Letters,2024,20(8):505-511

Copy
Share
Article Metrics
  • Abstract:90
  • PDF: 0
  • HTML: 0
  • Cited by: 0
History
  • Received:January 12,2024
  • Revised:April 08,2024
  • Online: July 24,2024
Article QR Code