Video-based body geometric aware network for 3D human pose estimation
CSTR:
Author:
Affiliation:

College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China

  • Article
  • | |
  • Metrics
  • |
  • Reference [24]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Three-dimensional human pose estimation (3D HPE) has broad application prospects in the fields of trajectory prediction, posture tracking and action analysis. However, the frequent self-occlusions and the substantial depth ambiguity in two-dimensional (2D) representations hinder the further improvement of accuracy. In this paper, we propose a novel video-based human body geometric aware network to mitigate the above problems. Our network can implicitly be aware of the geometric constraints of the human body by capturing spatial and temporal context information from 2D skeleton data. Specifically, a novel skeleton attention (SA) mechanism is proposed to model geometric context dependencies among different body joints, thereby improving the spatial feature representation ability of the network. To enhance the temporal consistency, a novel multilayer perceptron (MLP)-Mixer based structure is exploited to comprehensively learn temporal context information from input sequences. We conduct experiments on publicly available challenging datasets to evaluate the proposed approach. The results outperform the previous best approach by 0.5 mm in the Human3.6m dataset. It also demonstrates significant improvements in HumanEva-I dataset.

    Reference
    [1] MEHTA D, RHODIN H, CASAS D, et al. Monocular 3D human pose estimation in the wild using improved CNN supervision[C]//2017 International Conference on 3D Vision (3DV), October 10-12, 2017, Qingdao, China. New York:IEEE, 2017:506-516.
    [2] HOSSAIN M R I, LITTLE J J. Exploiting temporal information for 3D human pose estimation[C]//Proceedings of the European Conference on Computer Vision, September 8-14, 2018, Munich, Germany. Berlin:Springer, 2018:68-84.
    [3] LIN J, LEE G H. Trajectory space factorization for deep video-based 3D human pose estimation[C]//2019 British Machine Vision Conference (BMVC), September 9-12, 2019, Cardiff, UK. BMVA, 2019.
    [4] LUVIZON D C, PICARD D, TABIA H. 2D/3D pose estimation and action recognition using multitask deep learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18-22, 2018, Salt Lake, UT, USA. New York:IEEE, 2018:5137-5146.
    [5] MARTINEZ J, HOSSAIN R, ROMERO J, et al. A simple yet effective baseline for 3D human pose estimation[C]//Proceedings of the IEEE International Conference on Computer Vision, October 22-29, 2017, Venice, Italy. New York:IEEE, 2017:2640-2649.
    [6] PARK S, HWANG J, KWAK N. 3D human pose estimation using convolutional neural networks with 2D pose information[C]//Proceedings of the European Conference on Computer Vision, October 11-14, 2016, Amsterdam, The Netherlands. Berlin:Springer, 2016:156-169.
    [7] PAVLLO D, FEICHTENHOFER C, GRANGIER D, et al. 3D human pose estimation in video with temporal convolutions and semi-supervised training[C]//Pro-
    ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA, USA. New York:IEEE, 2019:7753-7762.
    [8] CHEN X, LIN K Y, LIU W, et al. Weakly-supervised discovery of geometry-aware representation for 3D human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA, USA. New York:IEEE, 2019:7753-7762.
    [9] FANG H S, XU Y, WANG W, et al. Learning pose grammar to encode human body configuration for 3D pose estimation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, February 2-7, 2018, New Orleans, Louisiana, USA. Cambridge:AAAI Press, 2018:6821-6828.
    [10] PAVLAKOS G, ZHOU X, DERPANIS K G, et al. Coarse-to-fine volumetric prediction for single-image 3D human pose[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York:IEEE, 2017:7025-7034.
    [11] XU J, YU Z, NI B, et al. Deep kinematics analysis for monocular 3D human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, 2020, Seattle, WA, USA. New York:IEEE, 2020:899-908.
    [12] CAI Y, GE L, LIU J, et al. Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, Korea (South). New York:IEEE, 2019:2272-2281.
    [13] ZHAO L, PENG X, TIAN Y, et al. Semantic graph convolutional networks for 3D human pose regression[C]//
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA, USA. New York:IEEE, 2019:3425-3435.
    [14] LIU K, DING R, ZOU Z, et al. A comprehensive study of weight sharing in graph networks for 3D human pose estimation[C]//Proceedings of the European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Berlin:Springer, 2020:318-334.
    [15] CI H, WANG C, MA X, et al. Optimizing network structure for 3D human pose estimation[C]//Procee-
    dings of the IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, Korea (South). New York:IEEE, 2019:2262-2271.
    [16] WANG J, YAN S, XIONG Y, et al. Motion guided 3D pose estimation from videos[C]//Proceedings of the European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Berlin:Springer, 2020:764-780.
    [17] LIU R, SHEN J, WANG H, et al. Attention mechanism exploits temporal contexts:real-time 3D human pose reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, 2020, Seattle, WA, USA. New York:IEEE, 2020:5064-5073.
    [18] TOLSTIKHIN I, HOULSBY N, KOLESNIKOV A, et al. MLP-mixer:an all-MLP architecture for vision[C]//Thirty-Fifth Conference on Neural Information Processing Systems (NeurlPS), December 6-12, 2021, Virtual Event. New York:Curran Associates, 2021:24261-24272.
    [19] CHEN C H, RAMANAN D. 3D human pose estimation= 2D pose estimation + matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York:IEEE, 2017:7035-7043.
    [20] ZHENG C, ZHU S, MENDIETA M, et al. 3D human pose estimation with spatial and temporal transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10-17, 2021,
    Montreal, QC, Canada. New York:IEEE, 2021:11656-11665.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

LI Chaonan, LIU Sheng, YAO Lu, ZOU Siyu. Video-based body geometric aware network for 3D human pose estimation[J]. Optoelectronics Letters,2022,18(5):313-320

Copy
Share
Article Metrics
  • Abstract:476
  • PDF: 361
  • HTML: 0
  • Cited by: 0
History
  • Received:February 03,2022
  • Revised:March 10,2022
  • Online: June 07,2022
Article QR Code