Semantics-aware transformer for 3D reconstruction from binocular images

doi:https://doi.org/10.1007/s11801-022-2055-0

Home > Archive>Volume 18, Issue 5, 2022 >293-299. DOI:https://doi.org/10.1007/s11801-022-2055-0

Semantics-aware transformer for 3D reconstruction from binocular images
DOI:
                        https://doi.org/10.1007/s11801-022-2055-0
                    
CSTR:
                        [cstr]
                    
Author:
                        JIA Xin1JIA Xin
The Engineering Research Center of Learning-Based Intelligent System and the Key Laboratory of Computer Vision and System of Ministry of Education, Tianjin University of Technology, Tianjin 300384, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
YANGShourui1YANGShourui
The Engineering Research Center of Learning-Based Intelligent System and the Key Laboratory of Computer Vision and System of Ministry of Education, Tianjin University of Technology, Tianjin 300384, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
GUAN Diyi2GUAN Diyi
Zhejiang University of Technology, Hangzhou 310014, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:1. The Engineering Research Center of Learning-Based Intelligent System and the Key Laboratory of Computer Vision and System of Ministry of Education, Tianjin University of Technology, Tianjin 300384, China;2. Zhejiang University of Technology, Hangzhou 310014, China
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [23]

Related [20]

Cited by

Materials

Comments

Abstract:

Existing multi-view three-dimensional (3D) reconstruction methods can only capture single type of feature from input view, failing to obtain fine-grained semantics for reconstructing the complex shapes. They rarely explore the semantic association between input views, leading to a rough 3D shape. To address these challenges, we propose a semantics-aware transformer (SATF) for 3D reconstruction. It is composed of two parallel view transformer encoders and a point cloud transformer decoder, and takes two red, green and blue (RGB) images as input and outputs a dense point cloud with richer details. Each view transformer encoder can learn a multi-level feature, facilitating characterizing fine-grained semantics from input view. The point cloud transformer decoder explores a semantically-associated feature by aligning the semantics of two input views, which describes the semantic association between views. Furthermore, it can generate a sparse point cloud using the semantically-associated feature. At last, the decoder enriches the sparse point cloud for producing a dense point cloud with richer details. Extensive experiments on the ShapeNet dataset show that our SATF outperforms the state-of-the-art methods.

Reference

[1] ZHANG Z, XUE W L. Video image mosaic via multi-module cooperation[J]. Optoelectronics letters, 2021, 17(11)：688-692.

[2] ZHANG Z B, XUE W L, FU G K. Unsupervised image-to-image translation by semantics consistency and self-attention[J]. Optoelectronics letters, 2022, 18：175-180.

[3] GAN Y, ZHANG J H, CHEN K Q, et al. A dynamic detection method to improve SLAM performance[J]. Optoelectronics letters, 2021, 17 (11)：693-698.

[4] CHEN Z Q, ZHANG H. Learning implicit fields for generative shape modeling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16-20, 2019, Long Beach, California, USA. New York：IEEE, 2019：5939-5948.

[5] LIU F, TRAN L, LIU X. Fully understanding generic objects：modeling, segmentation, and reconstruction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 19-25, 2021, Online, USA. New York：IEEE, 2021：7423-7433.

[6] AGARWAL N, GOPI M. GAMesh：guided and augmented meshing for deep point networks[C]//Proceedings of the IEEE International Conference on 3D Vision (3DV), November 25-28, 2020, Online, USA. New York：IEEE, 2020：702-711.

[7] FAN H, SU H, GUIBAS L J. A point set generation network for 3D object reconstruction from a single image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, Hawaii, USA. New York：IEEE, 2017：605-613.

[8] WANG N Y, ZHANG Y D, LI Z W, et al. Pixel2Mesh：3D mesh model generation via image guided deformation[J]. IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2021, 43(10)：3600-3613.

[9] CHOY C B, XU D F, GWAK J Y, et al. 3D-R2N2：a unified approach for single and multi-view 3D object reconstruction[C]//European Conference on Computer Vision (ECCV), October 8-16, 2016, Amsterdam, the Netherlands. Heidelberg：Springer, 2016：628-644.

[10] CHEN R, HAN S F, XU J, et al. Point-based multi-view stereo network[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 27-November 2, 2019, Seoul, Korea. New York：IEEE, 2019：1538-1547.

[11] YAO Y, LUO Z X, LI S W, et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16-20, 2019, Long Beach, California, USA. New York：IEEE, 2019：5525-5534.

[12] KAR A, HANE C, MALIK J. Learning a multi-view stereo machine[C]//Advances in Neural Information Processing Systems (NeurIPS), December 8-14, 2017, Long Beach, California, USA. Cambridge：MIT Press, 2017：365-376.

[13] WEN C, ZHANG Y D, LI Z W, et al. Pixel2Mesh++：multi-view 3D mesh generation via deformation[C]// Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 27-November 2, 2019, Seoul, Korea. New York：IEEE, 2019：1042-1051.

[14] JIA X, YANG S R, PENG Y X, et al. DV-net：dual-view network for 3D reconstruction by fusing multiple sets of gated control point clouds[J]. Pattern recognition letters, 2020, 131：376-382.

[15] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems (NeurIPS), December 8-14, 2017, Long Beach, CA, USA. Cambridge：MIT Press, 2017：5998-6008.

[16] QI C R, YI L, SU H, et al. PointNet++：deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems (NeurIPS), December 8-14, 2017, Long Beach, California, USA. Cambridge：MIT Press, 2017：5099-5108.

[17] HUANG L, WANG W M, CHEN J, et al. Attention on attention for image captioning[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 27- November 2, 2019, Seoul, Korea. New York：IEEE, 2019：4634-4643.

[18] YUAN W T, KHOT T, HELD D, et al. PCN：point completion network[C]//Proceedings of the IEEE International Conference on 3D Vision (3DV), September 5-8, 2018, Verona, Italy. New York：IEEE, 2020：728-737.

[19] CHANG A X, FUNKHOUSER T, GUIBAS L, et al. Shapenet：an information-rich 3D model repository[EB/OL]. (2015-12-09) [2022-01-22]. http：//arxiv.

org/abs/1512.03012.

[20] KLOKOV R, BOYER E, VERBEEK J. Discrete point flow networks for efficient point cloud generation[C]// European Conference on Computer Vision (ECCV), August 23-28, 2020, Online. Heidelberg：Springer, 2020：694-710.

[21] MESCHEDER L, OECHSLE M, Niemeyer M, et al. Occupancy networks：learning 3D reconstruction in function space[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16-20, 2019, Long Beach, California, USA. New York：IEEE, 2019：4460-4470.

[22] YAO Y, SCHERTLER N, ROSALES E, et al. Front2back：single view 3D shape reconstruction via front to back prediction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 13-19, 2020, Seattle, Western Australia. New York：IEEE, 2020：531-540.

Get Citation

JIA Xin, YANGShourui, GUAN Diyi. Semantics-aware transformer for 3D reconstruction from binocular images[J]. Optoelectronics Letters,2022,18(5):293-299

Copy

Article Metrics

Abstract:422
PDF: 406
HTML: 0
Cited by: 0

History

Received:April 04,2022
Revised:April 14,2022
Adopted:
Online: June 07,2022
Published:

Home

About us

Authors

Editors

News

Contents

Contact us

Get Citation

Share

Article Metrics

History

Article QR Code