Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475373acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation

Authors Info & Claims
Published:17 October 2021Publication History

ABSTRACT

Depth map super-resolution is a task with high practical application requirements in the industry. Existing color-guided depth map super-resolution methods usually necessitate an extra branch to extract high-frequency detail information from RGB image to guide the low-resolution depth map reconstruction. However, because there are still some differences between the two modalities, direct information transmission in the feature dimension or edge map dimension cannot achieve satisfactory result, and may even trigger texture copying in areas where the structures of the RGB-D pair are inconsistent. Inspired by the multi-task learning, we propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels. For the interaction of two subnetworks, we adopt a differentiated guidance strategy and design two bridges correspondingly. One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task. The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task. The entire network architecture is highly portable and can provide a paradigm for associating the DSR and MDE tasks. Extensive experiments on benchmark datasets demonstrate that our method achieves competitive performance. Our code and models are available at https://rmcong.github.io/proj_BridgeNet.html.

References

  1. Badour Albahar and Jia-Bin Huang. 2019. Guided Image-to-Image Translation With Bi-Directional Feature Transformation. In IEEE International Conference on Computer Vision. 9015--9024.Google ScholarGoogle Scholar
  2. Oisin Mac Aodha, Neill D. F. Campbell, Arun Nair, and Gabriel J. Brostow. 2012. Patch Based Synthesis for Single Depth Image Super-Resolution. In European Conference on Computer Vision, Vol. 7574. 71--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Simon Baker, Daniel Scharstein, J. P. Lewis, Stefan Roth, Michael J. Black, and Richard Szeliski. 2011. A Database and Evaluation Methodology for Optical Flow. International Journal of Computer Vision, Vol. 92, 1 (2011), 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yuanzhouhan Cao, Zifeng Wu, and Chunhua Shen. 2018. Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 28, 11 (2018), 3174--3182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. 2021. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 7012--7024.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang. 2019 a. Review of Visual Saliency Detection with Comprehensive Information. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 10 (2019), 2941--2959.Google ScholarGoogle ScholarCross RefCross Ref
  7. Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. 2020. Going from RGB to RGBD Saliency: A Depth-Guided Transformation Model. IEEE Transactions on Cybernetics, Vol. 50, 8 (2020), 3627--3639.Google ScholarGoogle ScholarCross RefCross Ref
  8. Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2018. Co-Saliency Detection for RGBD Images Based on Multi-Constraint Feature Matching and Cross Label Propagation. IEEE Transactions on Image Processing, Vol. 27, 2 (2018), 568--579.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Nam Ling. 2019 b. HSCS: Hierarchical Sparsity Based Co-Saliency Detection for RGBD Images. IEEE Transactions on Multimedia, Vol. 21, 7 (2019), 1660--1671.Google ScholarGoogle ScholarCross RefCross Ref
  10. Runmin Cong, Jianjun Lei, Huazhu Fu, Weisi Lin, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2019 c. An Iterative Co-Saliency Framework for RGBD Images. IEEE Transactions on Cybernetics, Vol. 49, 1 (2019), 233--246.Google ScholarGoogle ScholarCross RefCross Ref
  11. Runmin Cong, Jianjun Lei, Changqing Zhang, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2016. Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion. IEEE Signal Processing Letters, Vol. 23, 6 (2016), 819--823.Google ScholarGoogle ScholarCross RefCross Ref
  12. Riccardo de Lutio, Stefano D'Aronco, Jan Dirk Wegner, and Konrad Schindler. 2019. Guided Super-Resolution As Pixel-to-Pixel Transformation. In IEEE International Conference on Computer Vision. 8828--8836.Google ScholarGoogle Scholar
  13. Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a Deep Convolutional Network for Image Super-Resolution. In European Conference on Computer Vision, Vol. 8692. 184--199.Google ScholarGoogle ScholarCross RefCross Ref
  14. David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. In Advances in Neural Information Processing Systems. 2366--2374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. 2020. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 5 (2020), 2075--2089.Google ScholarGoogle ScholarCross RefCross Ref
  16. David Ferstl, Christian Reinbacher, René Ranftl, Matthias Rüther, and Horst Bischof. 2013. Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation. In IEEE International Conference on Computer Vision. 993--1000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Liuhao Ge, Hui Liang, Junsong Yuan, and Daniel Thalmann. 2019. Real-Time 3D Hand Pose Estimation with 3D Convolutional Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 4 (2019), 956--970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shuhang Gu, Wangmeng Zuo, Shi Guo, Yunjin Chen, Chongyu Chen, and Lei Zhang. 2017. Learning Dynamic Guidance for Depth Image Enhancement. In IEEE Conference on Computer Vision and Pattern Recognition. 712--721.Google ScholarGoogle Scholar
  19. Chunle Guo, Chongyi Li, Jichang Guo, Runmin Cong, Huazhu Fu, and Ping Han. 2019. Hierarchical Features Driven Residual Learning for Depth Map Super-Resolution. IEEE Transactions on Image Processing, Vol. 28, 5 (2019), 2545--2557. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bumsub Ham, Minsu Cho, and Jean Ponce. 2018. Robust Guided Image Filtering Using Nonconvex Potentials. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 1 (2018), 192--207.Google ScholarGoogle ScholarCross RefCross Ref
  21. Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, and Jie Zhou. 2021 a. SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images. Neurocomputing, Vol. 440 (2021), 251--263.Google ScholarGoogle ScholarCross RefCross Ref
  22. Lingzhi He, Hongguang Zhu, Feng Li, Huihui Bai, Runmin Cong, Chunjie Zhang, Chunyu Lin, Meiqin Liu, and Yao Zhao. 2021 b. Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline. In IEEE Conference on Computer Vision and Pattern Recognition. 9229--9238.Google ScholarGoogle Scholar
  23. Heiko Hirschmü ller and Daniel Scharstein. 2007. Evaluation of Cost Functions for Stereo Matching. In IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google ScholarGoogle Scholar
  24. Tak-Wai Hui, Chen Change Loy, and Xiaoou Tang. 2016. Depth Map Super-Resolution by Deep Multi-Scale Guidance. In European Conference on Computer Vision, Vol. 9907. 353--369.Google ScholarGoogle ScholarCross RefCross Ref
  25. Sunghoon Im, Hyowon Ha, Gyeongmin Choe, Hae-Gon Jeon, Kyungdon Joo, and In So Kweon. 2019. Accurate 3D Reconstruction from Small Motion Clip for Rolling Shutter Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 4 (2019), 775--787. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Christian Kerl, Jürgen Sturm, and Daniel Cremers. 2013. Dense Visual SLAM for RGB-D Cameras. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 2100--2106.Google ScholarGoogle Scholar
  27. Beomjun Kim, Jean Ponce, and Bumsub Ham. 2019. Deformable Kernel Networks for Guided Depth Map Upsampling. ArXiv Preprint ArXiv:1903.11286 (2019).Google ScholarGoogle Scholar
  28. Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper Depth Prediction with Fully Convolutional Residual Networks. In International Conference on 3D Vision. 239--248.Google ScholarGoogle Scholar
  29. Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. 2021. ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection. IEEE Transactions on Cybernetics, Vol. 50, 1 (2021), 88--100.Google ScholarGoogle ScholarCross RefCross Ref
  30. Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. 2020. RGB-D Salient Object Detection with Cross-Modality Modulation and Selection. In European Conference on Computer Vision. 225--241.Google ScholarGoogle Scholar
  31. Yijun Li, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2016. Deep Joint Image Filtering. In European Conference on Computer Vision, Vol. 9908. 154--169.Google ScholarGoogle Scholar
  32. Ming-Yu Liu, Oncel Tuzel, and Yuichi Taguchi. 2013. Joint Geodesic Upsampling of Depth Images. In IEEE Conference on Computer Vision and Pattern Recognition. 169--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jiangbo Lu, Keyang Shi, Dongbo Min, Liang Lin, and Minh N. Do. 2012. Cross-Based Local Multipoint Filtering. In IEEE Conference on Computer Vision and Pattern Recognition. 430--437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jinshan Pan, Jiangxin Dong, Jimmy S. J. Ren, Liang Lin, Jinhui Tang, and Ming-Hsuan Yang. 2019. Spatially Variant Linear Representation Models for Joint Filtering. In IEEE Conference on Computer Vision and Pattern Recognition. 1702--1711.Google ScholarGoogle Scholar
  35. Jaesik Park, Hyeongwoo Kim, Yu-Wing Tai, Michael S. Brown, and In-So Kweon. 2011. High Quality Depth Map Upsampling for 3D-TOF Cameras. In IEEE International Conference on Computer Vision. 1623--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gernot Riegler, Matthias Rüther, and Horst Bischof. 2016. ATGV-Net: Accurate Depth Super-Resolution. In European Conference on Computer Vision, Vol. 9907. 268--284.Google ScholarGoogle Scholar
  37. Daniel Scharstein, Heiko Hirschmü ller, York Kitajima, Greg Krathwohl, Nera Nesic, Xi Wang, and Porter Westling. 2014. High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth. In German Conference on Pattern Recognition, Vol. 8753. 31--42.Google ScholarGoogle Scholar
  38. Daniel Scharstein and Chris Pal. 2007. Learning Conditional Random Fields for Stereo. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1--8.Google ScholarGoogle Scholar
  39. Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In European Conference on Computer Vision, Vol. 7576. 746--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Vishwanath A. Sindagi and Vishal M. Patel. 2020. HA-CCN: Hierarchical Attention-Based Crowd Counting Network. IEEE Transactions on Image Processing, Vol. 29 (2020), 323--335.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik G. Learned-Miller, and Jan Kautz. 2019. Pixel-Adaptive Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 11166--11175.Google ScholarGoogle Scholar
  42. Baoli Sun, Xinchen Ye, Baopu Li, Haojie Li, Zhihui Wang, and Rui Xu. 2021. Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for Single Depth Super-Resolution. In IEEE Conference on Computer Vision and Pattern Recognition. 7792--7801.Google ScholarGoogle ScholarCross RefCross Ref
  43. Jin Wang, Wei Xu, Jian-Feng Cai, Qing Zhu, Yunhui Shi, and Baocai Yin. 2020 a. Multi-Direction Dictionary Learning Based Depth Map Super-Resolution With Autoregressive Modeling. IEEE Transactions on Multimedia, Vol. 22, 6 (2020), 1470--1484.Google ScholarGoogle ScholarCross RefCross Ref
  44. Lijun Wang, Jianming Zhang, Yifan Wang, Huchuan Lu, and Xiang Ruan. 2020 b. CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss. In European Conference on Computer Vision, Vol. 12350. 316--331.Google ScholarGoogle Scholar
  45. Yang Wen, Bin Sheng, Ping Li, Weiyao Lin, and David Dagan Feng. 2019. Deep Color Guided Coarse-to-Fine Convolutional Network Cascade for Depth Image Super-Resolution. IEEE Transactions on Image Processing, Vol. 28, 2 (2019), 994--1006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional Block Attention Module. In European Conference on Computer Vision, Vol. 11211. 3--19.Google ScholarGoogle Scholar
  47. Jun Xie, Cheng-Chuan Chou, Rogério Schmidt Feris, and Ming-Ting Sun. 2014. Single Depth Image Super Resolution and Denoising via Coupled Dictionary Learning with Local Constraints and Shock Filtering. In IEEE International Conference on Multimedia and Expo. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  48. Jun Xie, Rogé rio Schmidt Feris, and Ming-Ting Sun. 2016. Edge-Guided Single Depth Image Super Resolution. IEEE Transactions on Image Processing, Vol. 25, 1 (2016), 428--438.Google ScholarGoogle ScholarCross RefCross Ref
  49. Xinchen Ye, Xiangyue Duan, and Haojie Li. 2018. Depth Super-Resolution with Deep Edge-Inference Network and Edge-Guided Depth Filling. In IEEE International Conference on Acoustics, Speech and Signal Processing. 1398--1402.Google ScholarGoogle Scholar
  50. Xinchen Ye, Baoli Sun, Zhihui Wang, Jingyu Yang, Rui Xu, Haojie Li, and Baopu Li. 2020. PMBANet: Progressive Multi-Branch Aggregation Network for Scene Depth Super-Resolution. IEEE Transactions on Image Processing, Vol. 29 (2020), 7427--7442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yu Yin, Joseph P. Robinson, Yulun Zhang, and Yun Fu. 2020. Joint Super-Resolution and Alignment of Tiny Faces. In AAAI Conference on Artificial Intelligence. 12693--12700.Google ScholarGoogle Scholar
  52. Zhenyu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, and Jian Yang. 2018. Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation. In European Conference on Computer Vision, Vol. 11214. 238--255.Google ScholarGoogle Scholar
  53. Lijun Zhao, Huihui Bai, Jie Liang, Bing Zeng, Anhong Wang, and Yao Zhao. 2019. Simultaneously Color-Depth Super-Resolution with Conditional Generative Adversarial Network. Pattern Recognition, Vol. 88 (2019), 356--369.Google ScholarGoogle ScholarCross RefCross Ref
  54. Yi-Fan Zuo, Yuming Fang, Yong Yang, Xiwu Shang, and Bin Wang. 2019. Residual Dense Network For Intensity-Guided Depth Map Enhancement. Information Sciences, Vol. 495 (2019), 52--64.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '21: Proceedings of the 29th ACM International Conference on Multimedia
      October 2021
      5796 pages
      ISBN:9781450386517
      DOI:10.1145/3474085

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader