research-article

Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection

Authors:
Chen Zhang

Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China

Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
View Profile

,
Runmin Cong

Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology & City Univeristy of Hong Kong, Beijing, Hong Kong, China

Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology & City Univeristy of Hong Kong, Beijing, Hong Kong, China
View Profile

,
Qinwei Lin

Beijing Jiaotong University, Beijing, China

Beijing Jiaotong University, Beijing, China
View Profile

,
Lin Ma

Meituan, Beijing, China

Meituan, Beijing, China
View Profile

,
Feng Li

Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China

Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
View Profile

,
Yao Zhao

Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China

Beijing Jiaotong University & Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
View Profile

,
Sam Kwong

City Univeristy of Hong Kong, Hong Kong, China

City Univeristy of Hong Kong, Hong Kong, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 2094–2102https://doi.org/10.1145/3474085.3475364

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2094–2102

ABSTRACT

The popularity and promotion of depth maps have brought new vigor and vitality into salient object detection (SOD), and a mass of RGB-D SOD algorithms have been proposed, mainly concentrating on how to better integrate cross-modality features from RGB image and depth map. For the cross-modality interaction in feature encoder, existing methods either indiscriminately treat RGB and depth modalities, or only habitually utilize depth cues as auxiliary information of the RGB branch. Different from them, we reconsider the status of two modalities and propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD, which differentially models the dependence of two modalities according to the feature representations of different layers. To this end, two components are designed to implement the effective cross-modality interaction: 1) the RGB-induced Detail Enhancement (RDE) module leverages RGB modality to enhance the details of the depth features in low-level encoder stage. 2) the Depth-induced Semantic Enhancement (DSE) module transfers the object positioning and internal consistency of depth features to the RGB branch in high-level encoder stage. Furthermore, we also design a Dense Decoding Reconstruction (DDR) structure, which constructs a semantic block by combining multi-level encoder features to upgrade the skip connection in the feature decoding. Extensive experiments on five benchmark datasets demonstrate that our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively. Our code is publicly available at:https://rmcong.github.io/proj_CDINet.html.

Supplemental Material

Available for Download

zip

mfp1168aux.zip (2.3 MB)

Our supplementary material is latex .zip file, which contains more theoretical explanations and ablation studies.

References

Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE Transactions on Image Processing, Vol. 24, 12 (2015), 5706--5722.Google ScholarDigital Library
Hao Chen and Youfu Li. 2019. Three-stream attention-aware network for RGB-D salient object detection. IEEE Transactions on Image Processing, Vol. 28, 6 (2019), 2825--2835.Google ScholarCross Ref
Hao Chen, Youfu Li, and Dan Su. 2019. Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition, Vol. 86 (2019), 376--385.Google ScholarCross Ref
Shuhan Chen and Yun Fu. 2020. Progressively guided alternate refinement network for RGB-D salient object detection. In Proceedings of the European Conference on Computer Vision. Springer, 520--538.Google ScholarDigital Library
Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. 2021. DPANet: Depth potentiality-aware gated attention network for RGB-D salient object detection. IEEE Transactions on Image Processing (2021).Google Scholar
Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang. 2019 a. Review of visual saliency detection with comprehensive information. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 10 (2019), 2941--2959.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. 2019 b. Going from RGB to RGBD saliency: A depth-guided transformation model. IEEE Transactions on Cybernetics, Vol. 50, 8 (2019), 3627--3639.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2017. Co-saliency detection for RGBD images based on multi-constraint feature matching and cross label propagation. IEEE Transactions on Image Processing, Vol. 27, 2 (2017), 568--579.Google ScholarDigital Library
Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Nam Ling. 2019 c. HSCS: Hierarchical sparsity based co-saliency detection for RGBD images. IEEE Transactions on Multimedia, Vol. 21, 7 (2019), 1660--1671.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Huazhu Fu, Weisi Lin, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2019 d. An iterative co-saliency framework for RGBD images. IEEE Transactions on Cybernetics, Vol. 49, 1 (2019), 233--246.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Huazhu Fu, Fatih Porikli, Qingming Huang, and Chunping Hou. 2019 e. Video saliency detection via sparsity-based reconstruction and propagation. IEEE Transactions on Image Processing, Vol. 28, 10 (2019), 4819--4831.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Changqing Zhang, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2016. Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Processing Letters, Vol. 23, 6 (2016), 819--823.Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248--255.Google ScholarCross Ref
Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 4548--4557.Google ScholarCross Ref
Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. 2020. Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, 5 (2020), 2075--2089.Google ScholarCross Ref
David Feng, Nick Barnes, Shaodi You, and Chris McCarthy. 2016. Local background enclosure for RGB-D salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2343--2350.Google ScholarCross Ref
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3146--3154.Google ScholarCross Ref
Keren Fu, Deng-Ping Fan, Ge-Peng Ji, and Qijun Zhao. 2020. JL-DCF: Joint learning and densely-cooperative fusion framework for RGB-D salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3052--3062.Google ScholarCross Ref
Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. 2014. Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of the European Conference on Computer Vision. Springer, 345--360.Google ScholarCross Ref
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 7132--7141.Google ScholarCross Ref
Zhou Huang, Huai-Xin Chen, Tao Zhou, Yun-Zhi Yang, and Bi-Yuan Liu. 2021. Multi-level cross-modal interaction network for RGB-D salient object detection. Neurocomputing, Vol. 452 (2021), 200--211.Google ScholarCross Ref
Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. 2014. Depth saliency based on anisotropic center-surround difference. In IEEE International Conference on Image Processing. IEEE, 1115--1119.Google ScholarCross Ref
Philipp Kr"ahenbühl and Vladlen Koltun. 2011. Efficient inference in fully connected CRFs with gaussian edge potentials. In Advances in Neural Information Processing Systems. MIT press, 109--117. Google ScholarDigital Library
Chongyi Li, Runmin Cong, Junhui Hou, Sanyi Zhang, Yue Qian, and Sam Kwong. 2019. Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, Vol. 57, 11 (2019), 9156--9166.Google ScholarCross Ref
Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. 2021. ASIF-Net: Attention steered interweave fusion network for RGB-D salient object detection. IEEE Transactions on Cybernetics, Vol. 51, 1 (2021), 88--100.Google ScholarCross Ref
Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. 2020. RGB-D salient object detection with cross-modality modulation and selection. In Proceedings of the European Conference on Computer Vision. Springer, 225--241.Google ScholarDigital Library
Nianyi Li, Jinwei Ye, Yu Ji, Haibin Ling, and Jingyi Yu. 2014. Saliency detection on light field. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2806--2813. Google ScholarDigital Library
Nian Liu, Ni Zhang, and Junwei Han. 2020. Learning selective self-mutual attention for RGB-D saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 13756--13765.Google ScholarCross Ref
Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 454--461. Google ScholarDigital Library
Youwei Pang, Lihe Zhang, Xiaoqi Zhao, and Huchuan Lu. 2020. Hierarchical dynamic filtering network for RGB-D salient object detection. In Proceedings of the European Conference on Computer Vision. Springer, 235--252.Google ScholarCross Ref
Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. 2014. RGBD salient object detection: A benchmark and algorithms. In Proceedings of the European Conference on Computer Vision. Springer, 92--109.Google ScholarCross Ref
Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 7254--7263.Google ScholarCross Ref
Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, and Huchuan Lu. 2020. A2dele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 9060--9069.Google ScholarCross Ref
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234--241.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations. 1--14.Google Scholar
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 7794--7803.Google ScholarCross Ref
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision. Springer, 3--19.Google ScholarCross Ref
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning. PMLR, 2048--2057. Google ScholarDigital Library
Miao Zhang, Sun Xiao Fei, Jie Liu, Shuang Xu, Yongri Piao, and Huchuan Lu. 2020 a. Asymmetric two-stream architecture for accurate RGB-D saliency detection. In Proceedings of the European Conference on Computer Vision. Springer, 374--390.Google ScholarCross Ref
Miao Zhang, Weisong Ren, Yongri Piao, Zhengkun Rong, and Huchuan Lu. 2020 b. Select, supplement and focus for RGB-D saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3472--3481.Google ScholarCross Ref
Miao Zhang, Yu Zhang, Yongri Piao, Beiqi Hu, and Huchuan Lu. 2020 c. Feature reintegration over differential treatment: A top-down and adaptive fusion network for RGB-D salient object detection. In Proceedings of the 28th ACM International Conference on Multimedia. ACM, 4107--4115. Google ScholarDigital Library
Qijian Zhang, Runmin Cong, Chongyi Li, Ming-Ming Cheng, Yuming Fang, Xiaochun Cao, Yao Zhao, and Sam Kwong. 2021 a. Dense attention fluid network for salient object detection in optical remote sensing images. IEEE Transactions on Image Processing, Vol. 30 (2021), 1305--1317.Google ScholarDigital Library
Zhao Zhang, Zheng Lin, Jun Xu, Wen-Da Jin, Shao-Ping Lu, and Deng-Ping Fan. 2021 b. Bilateral attention network for RGB-D salient object detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 1949--1961.Google ScholarDigital Library
Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3927--3936.Google ScholarCross Ref
Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, and Lei Zhang. 2020. A single stream network for robust and real-time RGB-D salient object detection. In Proceedings of the European Conference on Computer Vision. Springer, 646--662.Google ScholarDigital Library

Index Terms

Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections

Recommendations

RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss
Abstract
RGB-D salient object detection aims at identifying attractive objects in a scene by combining the color image and depth map. However, due to the differences between RGB-D image pairs, it is a key issue to utilize cross-modal data ...
Read More
Dual Swin-transformer based mutual interactive network for RGB-D salient object detection
Abstract
Depth information for RGB-D Salient Object Detection(SOD) is important and conventional deep models are usually relied on the CNN feature extractors. The long-range contextual dependencies, dense modeling on the saliency decoder, and multi-task ...
Read More
CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection
Abstract
Compared with RGB salient object detection (SOD) methods, RGB-D SOD models show better performance in many challenging scenarios by leveraging spatial information embedded in depth maps. However, existing RGB-D SOD models prone to ignore the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
RGB-D images
dense decoding reconstruction
discrepant interaction
salient object detection
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 365
  Total Downloads
- Downloads (Last 12 months)86
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss

Dual Swin-transformer based mutual interactive network for RGB-D salient object detection

CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection