research-article

BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation

Authors:
Qi Tang

Institute of Information Science, Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China

Institute of Information Science, Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
View Profile

,
Runmin Cong

Institute of Information Science, Beijing Jiaotong University, Beijing Key Laboratory of Advanced Information Science and Network Technology, & City University of Hong Kong, Beijing, China

Institute of Information Science, Beijing Jiaotong University, Beijing Key Laboratory of Advanced Information Science and Network Technology, & City University of Hong Kong, Beijing, China
View Profile

,
Ronghui Sheng

Institute of Information Science, Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China

Institute of Information Science, Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
View Profile

,
Lingzhi He

Institute of Information Science, Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China

Institute of Information Science, Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
View Profile

,
Dan Zhang

UISEE Technology (Beijing) Co., Ltd., Beijing, China

UISEE Technology (Beijing) Co., Ltd., Beijing, China
View Profile

,
Yao Zhao

Institute of Information Science, Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China

Institute of Information Science, Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
View Profile

,
Sam Kwong

City Univeristy of Hong Kong, Hong Kong, China

City Univeristy of Hong Kong, Hong Kong, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 2148–2157https://doi.org/10.1145/3474085.3475373

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2148–2157

ABSTRACT

Depth map super-resolution is a task with high practical application requirements in the industry. Existing color-guided depth map super-resolution methods usually necessitate an extra branch to extract high-frequency detail information from RGB image to guide the low-resolution depth map reconstruction. However, because there are still some differences between the two modalities, direct information transmission in the feature dimension or edge map dimension cannot achieve satisfactory result, and may even trigger texture copying in areas where the structures of the RGB-D pair are inconsistent. Inspired by the multi-task learning, we propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels. For the interaction of two subnetworks, we adopt a differentiated guidance strategy and design two bridges correspondingly. One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task. The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task. The entire network architecture is highly portable and can provide a paradigm for associating the DSR and MDE tasks. Extensive experiments on benchmark datasets demonstrate that our method achieves competitive performance. Our code and models are available at https://rmcong.github.io/proj_BridgeNet.html.

References

Badour Albahar and Jia-Bin Huang. 2019. Guided Image-to-Image Translation With Bi-Directional Feature Transformation. In IEEE International Conference on Computer Vision. 9015--9024.Google Scholar
Oisin Mac Aodha, Neill D. F. Campbell, Arun Nair, and Gabriel J. Brostow. 2012. Patch Based Synthesis for Single Depth Image Super-Resolution. In European Conference on Computer Vision, Vol. 7574. 71--84. Google ScholarDigital Library
Simon Baker, Daniel Scharstein, J. P. Lewis, Stefan Roth, Michael J. Black, and Richard Szeliski. 2011. A Database and Evaluation Methodology for Optical Flow. International Journal of Computer Vision, Vol. 92, 1 (2011), 1--31. Google ScholarDigital Library
Yuanzhouhan Cao, Zifeng Wu, and Chunhua Shen. 2018. Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 28, 11 (2018), 3174--3182.Google ScholarDigital Library
Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. 2021. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 7012--7024.Google ScholarDigital Library
Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang. 2019 a. Review of Visual Saliency Detection with Comprehensive Information. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 10 (2019), 2941--2959.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. 2020. Going from RGB to RGBD Saliency: A Depth-Guided Transformation Model. IEEE Transactions on Cybernetics, Vol. 50, 8 (2020), 3627--3639.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2018. Co-Saliency Detection for RGBD Images Based on Multi-Constraint Feature Matching and Cross Label Propagation. IEEE Transactions on Image Processing, Vol. 27, 2 (2018), 568--579.Google ScholarDigital Library
Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Nam Ling. 2019 b. HSCS: Hierarchical Sparsity Based Co-Saliency Detection for RGBD Images. IEEE Transactions on Multimedia, Vol. 21, 7 (2019), 1660--1671.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Huazhu Fu, Weisi Lin, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2019 c. An Iterative Co-Saliency Framework for RGBD Images. IEEE Transactions on Cybernetics, Vol. 49, 1 (2019), 233--246.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Changqing Zhang, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2016. Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion. IEEE Signal Processing Letters, Vol. 23, 6 (2016), 819--823.Google ScholarCross Ref
Riccardo de Lutio, Stefano D'Aronco, Jan Dirk Wegner, and Konrad Schindler. 2019. Guided Super-Resolution As Pixel-to-Pixel Transformation. In IEEE International Conference on Computer Vision. 8828--8836.Google Scholar
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a Deep Convolutional Network for Image Super-Resolution. In European Conference on Computer Vision, Vol. 8692. 184--199.Google ScholarCross Ref
David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. In Advances in Neural Information Processing Systems. 2366--2374. Google ScholarDigital Library
Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. 2020. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 5 (2020), 2075--2089.Google ScholarCross Ref
David Ferstl, Christian Reinbacher, René Ranftl, Matthias Rüther, and Horst Bischof. 2013. Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation. In IEEE International Conference on Computer Vision. 993--1000. Google ScholarDigital Library
Liuhao Ge, Hui Liang, Junsong Yuan, and Daniel Thalmann. 2019. Real-Time 3D Hand Pose Estimation with 3D Convolutional Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 4 (2019), 956--970. Google ScholarDigital Library
Shuhang Gu, Wangmeng Zuo, Shi Guo, Yunjin Chen, Chongyu Chen, and Lei Zhang. 2017. Learning Dynamic Guidance for Depth Image Enhancement. In IEEE Conference on Computer Vision and Pattern Recognition. 712--721.Google Scholar
Chunle Guo, Chongyi Li, Jichang Guo, Runmin Cong, Huazhu Fu, and Ping Han. 2019. Hierarchical Features Driven Residual Learning for Depth Map Super-Resolution. IEEE Transactions on Image Processing, Vol. 28, 5 (2019), 2545--2557. Google ScholarDigital Library
Bumsub Ham, Minsu Cho, and Jean Ponce. 2018. Robust Guided Image Filtering Using Nonconvex Potentials. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 1 (2018), 192--207.Google ScholarCross Ref
Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, and Jie Zhou. 2021 a. SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images. Neurocomputing, Vol. 440 (2021), 251--263.Google ScholarCross Ref
Lingzhi He, Hongguang Zhu, Feng Li, Huihui Bai, Runmin Cong, Chunjie Zhang, Chunyu Lin, Meiqin Liu, and Yao Zhao. 2021 b. Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline. In IEEE Conference on Computer Vision and Pattern Recognition. 9229--9238.Google Scholar
Heiko Hirschmü ller and Daniel Scharstein. 2007. Evaluation of Cost Functions for Stereo Matching. In IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
Tak-Wai Hui, Chen Change Loy, and Xiaoou Tang. 2016. Depth Map Super-Resolution by Deep Multi-Scale Guidance. In European Conference on Computer Vision, Vol. 9907. 353--369.Google ScholarCross Ref
Sunghoon Im, Hyowon Ha, Gyeongmin Choe, Hae-Gon Jeon, Kyungdon Joo, and In So Kweon. 2019. Accurate 3D Reconstruction from Small Motion Clip for Rolling Shutter Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 4 (2019), 775--787. Google ScholarDigital Library
Christian Kerl, Jürgen Sturm, and Daniel Cremers. 2013. Dense Visual SLAM for RGB-D Cameras. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 2100--2106.Google Scholar
Beomjun Kim, Jean Ponce, and Bumsub Ham. 2019. Deformable Kernel Networks for Guided Depth Map Upsampling. ArXiv Preprint ArXiv:1903.11286 (2019).Google Scholar
Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper Depth Prediction with Fully Convolutional Residual Networks. In International Conference on 3D Vision. 239--248.Google Scholar
Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. 2021. ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection. IEEE Transactions on Cybernetics, Vol. 50, 1 (2021), 88--100.Google ScholarCross Ref
Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. 2020. RGB-D Salient Object Detection with Cross-Modality Modulation and Selection. In European Conference on Computer Vision. 225--241.Google Scholar
Yijun Li, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2016. Deep Joint Image Filtering. In European Conference on Computer Vision, Vol. 9908. 154--169.Google Scholar
Ming-Yu Liu, Oncel Tuzel, and Yuichi Taguchi. 2013. Joint Geodesic Upsampling of Depth Images. In IEEE Conference on Computer Vision and Pattern Recognition. 169--176. Google ScholarDigital Library
Jiangbo Lu, Keyang Shi, Dongbo Min, Liang Lin, and Minh N. Do. 2012. Cross-Based Local Multipoint Filtering. In IEEE Conference on Computer Vision and Pattern Recognition. 430--437. Google ScholarDigital Library
Jinshan Pan, Jiangxin Dong, Jimmy S. J. Ren, Liang Lin, Jinhui Tang, and Ming-Hsuan Yang. 2019. Spatially Variant Linear Representation Models for Joint Filtering. In IEEE Conference on Computer Vision and Pattern Recognition. 1702--1711.Google Scholar
Jaesik Park, Hyeongwoo Kim, Yu-Wing Tai, Michael S. Brown, and In-So Kweon. 2011. High Quality Depth Map Upsampling for 3D-TOF Cameras. In IEEE International Conference on Computer Vision. 1623--1630. Google ScholarDigital Library
Gernot Riegler, Matthias Rüther, and Horst Bischof. 2016. ATGV-Net: Accurate Depth Super-Resolution. In European Conference on Computer Vision, Vol. 9907. 268--284.Google Scholar
Daniel Scharstein, Heiko Hirschmü ller, York Kitajima, Greg Krathwohl, Nera Nesic, Xi Wang, and Porter Westling. 2014. High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth. In German Conference on Pattern Recognition, Vol. 8753. 31--42.Google Scholar
Daniel Scharstein and Chris Pal. 2007. Learning Conditional Random Fields for Stereo. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In European Conference on Computer Vision, Vol. 7576. 746--760. Google ScholarDigital Library
Vishwanath A. Sindagi and Vishal M. Patel. 2020. HA-CCN: Hierarchical Attention-Based Crowd Counting Network. IEEE Transactions on Image Processing, Vol. 29 (2020), 323--335.Google ScholarDigital Library
Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik G. Learned-Miller, and Jan Kautz. 2019. Pixel-Adaptive Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 11166--11175.Google Scholar
Baoli Sun, Xinchen Ye, Baopu Li, Haojie Li, Zhihui Wang, and Rui Xu. 2021. Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for Single Depth Super-Resolution. In IEEE Conference on Computer Vision and Pattern Recognition. 7792--7801.Google ScholarCross Ref
Jin Wang, Wei Xu, Jian-Feng Cai, Qing Zhu, Yunhui Shi, and Baocai Yin. 2020 a. Multi-Direction Dictionary Learning Based Depth Map Super-Resolution With Autoregressive Modeling. IEEE Transactions on Multimedia, Vol. 22, 6 (2020), 1470--1484.Google ScholarCross Ref
Lijun Wang, Jianming Zhang, Yifan Wang, Huchuan Lu, and Xiang Ruan. 2020 b. CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss. In European Conference on Computer Vision, Vol. 12350. 316--331.Google Scholar
Yang Wen, Bin Sheng, Ping Li, Weiyao Lin, and David Dagan Feng. 2019. Deep Color Guided Coarse-to-Fine Convolutional Network Cascade for Depth Image Super-Resolution. IEEE Transactions on Image Processing, Vol. 28, 2 (2019), 994--1006.Google ScholarDigital Library
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional Block Attention Module. In European Conference on Computer Vision, Vol. 11211. 3--19.Google Scholar
Jun Xie, Cheng-Chuan Chou, Rogério Schmidt Feris, and Ming-Ting Sun. 2014. Single Depth Image Super Resolution and Denoising via Coupled Dictionary Learning with Local Constraints and Shock Filtering. In IEEE International Conference on Multimedia and Expo. 1--6.Google ScholarCross Ref
Jun Xie, Rogé rio Schmidt Feris, and Ming-Ting Sun. 2016. Edge-Guided Single Depth Image Super Resolution. IEEE Transactions on Image Processing, Vol. 25, 1 (2016), 428--438.Google ScholarCross Ref
Xinchen Ye, Xiangyue Duan, and Haojie Li. 2018. Depth Super-Resolution with Deep Edge-Inference Network and Edge-Guided Depth Filling. In IEEE International Conference on Acoustics, Speech and Signal Processing. 1398--1402.Google Scholar
Xinchen Ye, Baoli Sun, Zhihui Wang, Jingyu Yang, Rui Xu, Haojie Li, and Baopu Li. 2020. PMBANet: Progressive Multi-Branch Aggregation Network for Scene Depth Super-Resolution. IEEE Transactions on Image Processing, Vol. 29 (2020), 7427--7442.Google ScholarDigital Library
Yu Yin, Joseph P. Robinson, Yulun Zhang, and Yun Fu. 2020. Joint Super-Resolution and Alignment of Tiny Faces. In AAAI Conference on Artificial Intelligence. 12693--12700.Google Scholar
Zhenyu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, and Jian Yang. 2018. Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation. In European Conference on Computer Vision, Vol. 11214. 238--255.Google Scholar
Lijun Zhao, Huihui Bai, Jie Liang, Bing Zeng, Anhong Wang, and Yao Zhao. 2019. Simultaneously Color-Depth Super-Resolution with Conditional Generative Adversarial Network. Pattern Recognition, Vol. 88 (2019), 356--369.Google ScholarCross Ref
Yi-Fan Zuo, Yuming Fang, Yong Yang, Xiwu Shang, and Bin Wang. 2019. Residual Dense Network For Intensity-Guided Depth Map Enhancement. Information Sciences, Vol. 495 (2019), 52--64.Google ScholarDigital Library

Index Terms

BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems

Recommendations

Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Color-guided depth super-resolution (DSR) is an encouraging paradigm that enhances a low-resolution (LR) depth map guided by an extra high-resolution (HR) RGB image from the same scene. Existing methods usually use interpolation to upscale the depth ...
Read More
Depth map Super-Resolution based on joint dictionary learning

Although Time-of-Flight (ToF) camera can provide real-time depth information from a real scene, the resolution of depth map captured by ToF camera is rather limited compared to HD color cameras, and thus it cannot be directly used in 3D reconstruction. ...
Read More
Single Depth Map Super-resolution with Local Self-similarity
ICVIP '18: Proceedings of the 2018 2nd International Conference on Video and Image Processing

Consumer depth sensors such as time-of-flight camera or Kinect have gained significant popularity in recently. However, the captured depth maps suffer from limited spatial resolution and a variety of noise, making such depth maps difficult to be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
depth map
monocular depth estimation
multi-task learning
super-resolution
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 344
  Total Downloads
- Downloads (Last 12 months)96
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution

Depth map Super-Resolution based on joint dictionary learning

Single Depth Map Super-resolution with Local Self-similarity