ABSTRACT
Depth map super-resolution is a task with high practical application requirements in the industry. Existing color-guided depth map super-resolution methods usually necessitate an extra branch to extract high-frequency detail information from RGB image to guide the low-resolution depth map reconstruction. However, because there are still some differences between the two modalities, direct information transmission in the feature dimension or edge map dimension cannot achieve satisfactory result, and may even trigger texture copying in areas where the structures of the RGB-D pair are inconsistent. Inspired by the multi-task learning, we propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels. For the interaction of two subnetworks, we adopt a differentiated guidance strategy and design two bridges correspondingly. One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task. The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task. The entire network architecture is highly portable and can provide a paradigm for associating the DSR and MDE tasks. Extensive experiments on benchmark datasets demonstrate that our method achieves competitive performance. Our code and models are available at https://rmcong.github.io/proj_BridgeNet.html.
- Badour Albahar and Jia-Bin Huang. 2019. Guided Image-to-Image Translation With Bi-Directional Feature Transformation. In IEEE International Conference on Computer Vision. 9015--9024.Google Scholar
- Oisin Mac Aodha, Neill D. F. Campbell, Arun Nair, and Gabriel J. Brostow. 2012. Patch Based Synthesis for Single Depth Image Super-Resolution. In European Conference on Computer Vision, Vol. 7574. 71--84. Google Scholar
Digital Library
- Simon Baker, Daniel Scharstein, J. P. Lewis, Stefan Roth, Michael J. Black, and Richard Szeliski. 2011. A Database and Evaluation Methodology for Optical Flow. International Journal of Computer Vision, Vol. 92, 1 (2011), 1--31. Google Scholar
Digital Library
- Yuanzhouhan Cao, Zifeng Wu, and Chunhua Shen. 2018. Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 28, 11 (2018), 3174--3182.Google Scholar
Digital Library
- Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. 2021. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 7012--7024.Google Scholar
Digital Library
- Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang. 2019 a. Review of Visual Saliency Detection with Comprehensive Information. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 10 (2019), 2941--2959.Google Scholar
Cross Ref
- Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. 2020. Going from RGB to RGBD Saliency: A Depth-Guided Transformation Model. IEEE Transactions on Cybernetics, Vol. 50, 8 (2020), 3627--3639.Google Scholar
Cross Ref
- Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2018. Co-Saliency Detection for RGBD Images Based on Multi-Constraint Feature Matching and Cross Label Propagation. IEEE Transactions on Image Processing, Vol. 27, 2 (2018), 568--579.Google Scholar
Digital Library
- Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Nam Ling. 2019 b. HSCS: Hierarchical Sparsity Based Co-Saliency Detection for RGBD Images. IEEE Transactions on Multimedia, Vol. 21, 7 (2019), 1660--1671.Google Scholar
Cross Ref
- Runmin Cong, Jianjun Lei, Huazhu Fu, Weisi Lin, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2019 c. An Iterative Co-Saliency Framework for RGBD Images. IEEE Transactions on Cybernetics, Vol. 49, 1 (2019), 233--246.Google Scholar
Cross Ref
- Runmin Cong, Jianjun Lei, Changqing Zhang, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2016. Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion. IEEE Signal Processing Letters, Vol. 23, 6 (2016), 819--823.Google Scholar
Cross Ref
- Riccardo de Lutio, Stefano D'Aronco, Jan Dirk Wegner, and Konrad Schindler. 2019. Guided Super-Resolution As Pixel-to-Pixel Transformation. In IEEE International Conference on Computer Vision. 8828--8836.Google Scholar
- Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a Deep Convolutional Network for Image Super-Resolution. In European Conference on Computer Vision, Vol. 8692. 184--199.Google Scholar
Cross Ref
- David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. In Advances in Neural Information Processing Systems. 2366--2374. Google Scholar
Digital Library
- Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. 2020. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 5 (2020), 2075--2089.Google Scholar
Cross Ref
- David Ferstl, Christian Reinbacher, René Ranftl, Matthias Rüther, and Horst Bischof. 2013. Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation. In IEEE International Conference on Computer Vision. 993--1000. Google Scholar
Digital Library
- Liuhao Ge, Hui Liang, Junsong Yuan, and Daniel Thalmann. 2019. Real-Time 3D Hand Pose Estimation with 3D Convolutional Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 4 (2019), 956--970. Google Scholar
Digital Library
- Shuhang Gu, Wangmeng Zuo, Shi Guo, Yunjin Chen, Chongyu Chen, and Lei Zhang. 2017. Learning Dynamic Guidance for Depth Image Enhancement. In IEEE Conference on Computer Vision and Pattern Recognition. 712--721.Google Scholar
- Chunle Guo, Chongyi Li, Jichang Guo, Runmin Cong, Huazhu Fu, and Ping Han. 2019. Hierarchical Features Driven Residual Learning for Depth Map Super-Resolution. IEEE Transactions on Image Processing, Vol. 28, 5 (2019), 2545--2557. Google Scholar
Digital Library
- Bumsub Ham, Minsu Cho, and Jean Ponce. 2018. Robust Guided Image Filtering Using Nonconvex Potentials. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 1 (2018), 192--207.Google Scholar
Cross Ref
- Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, and Jie Zhou. 2021 a. SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images. Neurocomputing, Vol. 440 (2021), 251--263.Google Scholar
Cross Ref
- Lingzhi He, Hongguang Zhu, Feng Li, Huihui Bai, Runmin Cong, Chunjie Zhang, Chunyu Lin, Meiqin Liu, and Yao Zhao. 2021 b. Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline. In IEEE Conference on Computer Vision and Pattern Recognition. 9229--9238.Google Scholar
- Heiko Hirschmü ller and Daniel Scharstein. 2007. Evaluation of Cost Functions for Stereo Matching. In IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
- Tak-Wai Hui, Chen Change Loy, and Xiaoou Tang. 2016. Depth Map Super-Resolution by Deep Multi-Scale Guidance. In European Conference on Computer Vision, Vol. 9907. 353--369.Google Scholar
Cross Ref
- Sunghoon Im, Hyowon Ha, Gyeongmin Choe, Hae-Gon Jeon, Kyungdon Joo, and In So Kweon. 2019. Accurate 3D Reconstruction from Small Motion Clip for Rolling Shutter Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 4 (2019), 775--787. Google Scholar
Digital Library
- Christian Kerl, Jürgen Sturm, and Daniel Cremers. 2013. Dense Visual SLAM for RGB-D Cameras. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 2100--2106.Google Scholar
- Beomjun Kim, Jean Ponce, and Bumsub Ham. 2019. Deformable Kernel Networks for Guided Depth Map Upsampling. ArXiv Preprint ArXiv:1903.11286 (2019).Google Scholar
- Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper Depth Prediction with Fully Convolutional Residual Networks. In International Conference on 3D Vision. 239--248.Google Scholar
- Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. 2021. ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection. IEEE Transactions on Cybernetics, Vol. 50, 1 (2021), 88--100.Google Scholar
Cross Ref
- Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. 2020. RGB-D Salient Object Detection with Cross-Modality Modulation and Selection. In European Conference on Computer Vision. 225--241.Google Scholar
- Yijun Li, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2016. Deep Joint Image Filtering. In European Conference on Computer Vision, Vol. 9908. 154--169.Google Scholar
- Ming-Yu Liu, Oncel Tuzel, and Yuichi Taguchi. 2013. Joint Geodesic Upsampling of Depth Images. In IEEE Conference on Computer Vision and Pattern Recognition. 169--176. Google Scholar
Digital Library
- Jiangbo Lu, Keyang Shi, Dongbo Min, Liang Lin, and Minh N. Do. 2012. Cross-Based Local Multipoint Filtering. In IEEE Conference on Computer Vision and Pattern Recognition. 430--437. Google Scholar
Digital Library
- Jinshan Pan, Jiangxin Dong, Jimmy S. J. Ren, Liang Lin, Jinhui Tang, and Ming-Hsuan Yang. 2019. Spatially Variant Linear Representation Models for Joint Filtering. In IEEE Conference on Computer Vision and Pattern Recognition. 1702--1711.Google Scholar
- Jaesik Park, Hyeongwoo Kim, Yu-Wing Tai, Michael S. Brown, and In-So Kweon. 2011. High Quality Depth Map Upsampling for 3D-TOF Cameras. In IEEE International Conference on Computer Vision. 1623--1630. Google Scholar
Digital Library
- Gernot Riegler, Matthias Rüther, and Horst Bischof. 2016. ATGV-Net: Accurate Depth Super-Resolution. In European Conference on Computer Vision, Vol. 9907. 268--284.Google Scholar
- Daniel Scharstein, Heiko Hirschmü ller, York Kitajima, Greg Krathwohl, Nera Nesic, Xi Wang, and Porter Westling. 2014. High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth. In German Conference on Pattern Recognition, Vol. 8753. 31--42.Google Scholar
- Daniel Scharstein and Chris Pal. 2007. Learning Conditional Random Fields for Stereo. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
- Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In European Conference on Computer Vision, Vol. 7576. 746--760. Google Scholar
Digital Library
- Vishwanath A. Sindagi and Vishal M. Patel. 2020. HA-CCN: Hierarchical Attention-Based Crowd Counting Network. IEEE Transactions on Image Processing, Vol. 29 (2020), 323--335.Google Scholar
Digital Library
- Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik G. Learned-Miller, and Jan Kautz. 2019. Pixel-Adaptive Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 11166--11175.Google Scholar
- Baoli Sun, Xinchen Ye, Baopu Li, Haojie Li, Zhihui Wang, and Rui Xu. 2021. Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for Single Depth Super-Resolution. In IEEE Conference on Computer Vision and Pattern Recognition. 7792--7801.Google Scholar
Cross Ref
- Jin Wang, Wei Xu, Jian-Feng Cai, Qing Zhu, Yunhui Shi, and Baocai Yin. 2020 a. Multi-Direction Dictionary Learning Based Depth Map Super-Resolution With Autoregressive Modeling. IEEE Transactions on Multimedia, Vol. 22, 6 (2020), 1470--1484.Google Scholar
Cross Ref
- Lijun Wang, Jianming Zhang, Yifan Wang, Huchuan Lu, and Xiang Ruan. 2020 b. CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss. In European Conference on Computer Vision, Vol. 12350. 316--331.Google Scholar
- Yang Wen, Bin Sheng, Ping Li, Weiyao Lin, and David Dagan Feng. 2019. Deep Color Guided Coarse-to-Fine Convolutional Network Cascade for Depth Image Super-Resolution. IEEE Transactions on Image Processing, Vol. 28, 2 (2019), 994--1006.Google Scholar
Digital Library
- Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional Block Attention Module. In European Conference on Computer Vision, Vol. 11211. 3--19.Google Scholar
- Jun Xie, Cheng-Chuan Chou, Rogério Schmidt Feris, and Ming-Ting Sun. 2014. Single Depth Image Super Resolution and Denoising via Coupled Dictionary Learning with Local Constraints and Shock Filtering. In IEEE International Conference on Multimedia and Expo. 1--6.Google Scholar
Cross Ref
- Jun Xie, Rogé rio Schmidt Feris, and Ming-Ting Sun. 2016. Edge-Guided Single Depth Image Super Resolution. IEEE Transactions on Image Processing, Vol. 25, 1 (2016), 428--438.Google Scholar
Cross Ref
- Xinchen Ye, Xiangyue Duan, and Haojie Li. 2018. Depth Super-Resolution with Deep Edge-Inference Network and Edge-Guided Depth Filling. In IEEE International Conference on Acoustics, Speech and Signal Processing. 1398--1402.Google Scholar
- Xinchen Ye, Baoli Sun, Zhihui Wang, Jingyu Yang, Rui Xu, Haojie Li, and Baopu Li. 2020. PMBANet: Progressive Multi-Branch Aggregation Network for Scene Depth Super-Resolution. IEEE Transactions on Image Processing, Vol. 29 (2020), 7427--7442.Google Scholar
Digital Library
- Yu Yin, Joseph P. Robinson, Yulun Zhang, and Yun Fu. 2020. Joint Super-Resolution and Alignment of Tiny Faces. In AAAI Conference on Artificial Intelligence. 12693--12700.Google Scholar
- Zhenyu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, and Jian Yang. 2018. Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation. In European Conference on Computer Vision, Vol. 11214. 238--255.Google Scholar
- Lijun Zhao, Huihui Bai, Jie Liang, Bing Zeng, Anhong Wang, and Yao Zhao. 2019. Simultaneously Color-Depth Super-Resolution with Conditional Generative Adversarial Network. Pattern Recognition, Vol. 88 (2019), 356--369.Google Scholar
Cross Ref
- Yi-Fan Zuo, Yuming Fang, Yong Yang, Xiwu Shang, and Bin Wang. 2019. Residual Dense Network For Intensity-Guided Depth Map Enhancement. Information Sciences, Vol. 495 (2019), 52--64.Google Scholar
Digital Library
Index Terms
- BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation
Recommendations
Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution
MM '22: Proceedings of the 30th ACM International Conference on MultimediaColor-guided depth super-resolution (DSR) is an encouraging paradigm that enhances a low-resolution (LR) depth map guided by an extra high-resolution (HR) RGB image from the same scene. Existing methods usually use interpolation to upscale the depth ...
Depth map Super-Resolution based on joint dictionary learning
Although Time-of-Flight (ToF) camera can provide real-time depth information from a real scene, the resolution of depth map captured by ToF camera is rather limited compared to HD color cameras, and thus it cannot be directly used in 3D reconstruction. ...
Single Depth Map Super-resolution with Local Self-similarity
ICVIP '18: Proceedings of the 2018 2nd International Conference on Video and Image ProcessingConsumer depth sensors such as time-of-flight camera or Kinect have gained significant popularity in recently. However, the captured depth maps suffer from limited spatial resolution and a variety of noise, making such depth maps difficult to be ...
Comments