Semantics-aware transformer for 3D reconstruction from binocular images
CSTR:
Author:
Affiliation:

1. The Engineering Research Center of Learning-Based Intelligent System and the Key Laboratory of Computer Vision and System of Ministry of Education, Tianjin University of Technology, Tianjin 300384, China;2. Zhejiang University of Technology, Hangzhou 310014, China

  • Article
  • | |
  • Metrics
  • |
  • Reference [23]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Existing multi-view three-dimensional (3D) reconstruction methods can only capture single type of feature from input view, failing to obtain fine-grained semantics for reconstructing the complex shapes. They rarely explore the semantic association between input views, leading to a rough 3D shape. To address these challenges, we propose a semantics-aware transformer (SATF) for 3D reconstruction. It is composed of two parallel view transformer encoders and a point cloud transformer decoder, and takes two red, green and blue (RGB) images as input and outputs a dense point cloud with richer details. Each view transformer encoder can learn a multi-level feature, facilitating characterizing fine-grained semantics from input view. The point cloud transformer decoder explores a semantically-associated feature by aligning the semantics of two input views, which describes the semantic association between views. Furthermore, it can generate a sparse point cloud using the semantically-associated feature. At last, the decoder enriches the sparse point cloud for producing a dense point cloud with richer details. Extensive experiments on the ShapeNet dataset show that our SATF outperforms the state-of-the-art methods.

    Reference
    [1] ZHANG Z, XUE W L. Video image mosaic via multi-module cooperation[J]. Optoelectronics letters, 2021, 17(11):688-692.
    [2] ZHANG Z B, XUE W L, FU G K. Unsupervised image-to-image translation by semantics consistency and self-attention[J]. Optoelectronics letters, 2022, 18:175-180.
    [3] GAN Y, ZHANG J H, CHEN K Q, et al. A dynamic detection method to improve SLAM performance[J]. Optoelectronics letters, 2021, 17 (11):693-698.
    [4] CHEN Z Q, ZHANG H. Learning implicit fields for generative shape modeling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16-20, 2019, Long Beach, California, USA. New York:IEEE, 2019:5939-5948.
    [5] LIU F, TRAN L, LIU X. Fully understanding generic objects:modeling, segmentation, and reconstruction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 19-25, 2021, Online, USA. New York:IEEE, 2021:7423-7433.
    [6] AGARWAL N, GOPI M. GAMesh:guided and augmented meshing for deep point networks[C]//Proceedings of the IEEE International Conference on 3D Vision (3DV), November 25-28, 2020, Online, USA. New York:IEEE, 2020:702-711.
    [7] FAN H, SU H, GUIBAS L J. A point set generation network for 3D object reconstruction from a single image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, Hawaii, USA. New York:IEEE, 2017:605-613.
    [8] WANG N Y, ZHANG Y D, LI Z W, et al. Pixel2Mesh:3D mesh model generation via image guided deformation[J]. IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2021, 43(10):3600-3613.
    [9] CHOY C B, XU D F, GWAK J Y, et al. 3D-R2N2:a unified approach for single and multi-view 3D object reconstruction[C]//European Conference on Computer Vision (ECCV), October 8-16, 2016, Amsterdam, the Netherlands. Heidelberg:Springer, 2016:628-644.
    [10] CHEN R, HAN S F, XU J, et al. Point-based multi-view stereo network[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 27-November 2, 2019, Seoul, Korea. New York:IEEE, 2019:1538-1547.
    [11] YAO Y, LUO Z X, LI S W, et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16-20, 2019, Long Beach, California, USA. New York:IEEE, 2019:5525-5534.
    [12] KAR A, HANE C, MALIK J. Learning a multi-view stereo machine[C]//Advances in Neural Information Processing Systems (NeurIPS), December 8-14, 2017, Long Beach, California, USA. Cambridge:MIT Press, 2017:365-376.
    [13] WEN C, ZHANG Y D, LI Z W, et al. Pixel2Mesh++:multi-view 3D mesh generation via deformation[C]// Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 27-November 2, 2019, Seoul, Korea. New York:IEEE, 2019:1042-1051.
    [14] JIA X, YANG S R, PENG Y X, et al. DV-net:dual-view network for 3D reconstruction by fusing multiple sets of gated control point clouds[J]. Pattern recognition letters, 2020, 131:376-382.
    [15] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems (NeurIPS), December 8-14, 2017, Long Beach, CA, USA. Cambridge:MIT Press, 2017:5998-6008.
    [16] QI C R, YI L, SU H, et al. PointNet++:deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems (NeurIPS), December 8-14, 2017, Long Beach, California, USA. Cambridge:MIT Press, 2017:5099-5108.
    [17] HUANG L, WANG W M, CHEN J, et al. Attention on attention for image captioning[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 27- November 2, 2019, Seoul, Korea. New York:IEEE, 2019:4634-4643.
    [18] YUAN W T, KHOT T, HELD D, et al. PCN:point completion network[C]//Proceedings of the IEEE International Conference on 3D Vision (3DV), September 5-8, 2018, Verona, Italy. New York:IEEE, 2020:728-737.
    [19] CHANG A X, FUNKHOUSER T, GUIBAS L, et al. Shapenet:an information-rich 3D model repository[EB/OL]. (2015-12-09) [2022-01-22]. http://arxiv.
    org/abs/1512.03012.
    [20] KLOKOV R, BOYER E, VERBEEK J. Discrete point flow networks for efficient point cloud generation[C]// European Conference on Computer Vision (ECCV), August 23-28, 2020, Online. Heidelberg:Springer, 2020:694-710.
    [21] MESCHEDER L, OECHSLE M, Niemeyer M, et al. Occupancy networks:learning 3D reconstruction in function space[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16-20, 2019, Long Beach, California, USA. New York:IEEE, 2019:4460-4470.
    [22] YAO Y, SCHERTLER N, ROSALES E, et al. Front2back:single view 3D shape reconstruction via front to back prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 13-19, 2020, Seattle, Western Australia. New York:IEEE, 2020:531-540.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

JIA Xin, YANGShourui, GUAN Diyi. Semantics-aware transformer for 3D reconstruction from binocular images[J]. Optoelectronics Letters,2022,18(5):293-299

Copy
Share
Article Metrics
  • Abstract:422
  • PDF: 406
  • HTML: 0
  • Cited by: 0
History
  • Received:April 04,2022
  • Revised:April 14,2022
  • Online: June 07,2022
Article QR Code