Multi-peak Graph-based Multi-instance Learning for Weakly Supervised Object Detection

Authors:
Ruyi Ji

Institute of Software, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China

Institute of Software, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China

0000-0001-8918-0981
View Profile

,
Zeyu Liu

Department of Automation, China University of Petroleum, Beijing, Changping, Beijing, China

Department of Automation, China University of Petroleum, Beijing, Changping, Beijing, China
View Profile

,
Libo Zhang

Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China

Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China
View Profile

,
Jianwei Liu

Department of Automation, China University of Petroleum, Beijing, Changping, Beijing, China

Department of Automation, China University of Petroleum, Beijing, Changping, Beijing, China
View Profile

,
Xin Zuo

Department of Automation, China University of Petroleum, Beijing, Changping, Beijing, China

Department of Automation, China University of Petroleum, Beijing, Changping, Beijing, China
View Profile

,
Yanjun Wu

Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China

Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China
View Profile

,
Chen Zhao

Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China

Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China
View Profile

,
Haofeng Wang

Beijing Institute of Computer Technology and Applications, Beijing, China

Beijing Institute of Computer Technology and Applications, Beijing, China
View Profile

,
Lin Yang

Beijing Institute of Computer Technology and Applications, Beijing, China, China

Beijing Institute of Computer Technology and Applications, Beijing, China, China
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17 Issue 2sArticle No.: 70pp 1–21https://doi.org/10.1145/3432861

Published:14 June 2021Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Weakly supervised object detection (WSOD), aiming to detect objects with only image-level annotations, has become one of the research hotspots over the past few years. Recently, much effort has been devoted to WSOD for the simple yet effective architecture and remarkable improvements have been achieved. Existing approaches using multiple-instance learning usually pay more attention to the proposals individually, ignoring relation information between proposals. Besides, to obtain pseudo-ground-truth boxes for WSOD, MIL-based methods tend to select the region with the highest confidence score and regard those with small overlap as background category, which leads to mislabeled instances. As a result, these methods suffer from mislabeling instances and lacking relations between proposals, degrading the performance of WSOD. To tackle these issues, this article introduces a multi-peak graph-based model for WSOD. Specifically, we use the instance graph to model the relations between proposals, which reinforces multiple-instance learning process. In addition, a multi-peak discovery strategy is designed to avert mislabeling instances. The proposed model is trained by stochastic gradients decent optimizer using back-propagation in an end-to-end manner. Extensive quantitative and qualitative evaluations on two publicly challenging benchmarks, PASCAL VOC 2007 and PASCAL VOC 2012, demonstrate the superiority and effectiveness of the proposed approach.

References

Stefanos Angelidis and Mirella Lapata. 2018. Multiple-instance learning networks for fine-grained sentiment analysis. Trans. Assoc. Comput. Linguist. 6 (2018), 17–31. DOI:https://doi.org/10.1162/tacl_a_00002Google ScholarCross Ref
Aditya Arun, C. V. Jawahar, and M. Pawan Kumar. 2018. Dissimilarity coefficient-based weakly supervised object detection. Retrieved from http://arxiv.org/abs/1811.10016.Google Scholar
C. Bergeron, G. Moore, J. Zaretzki, C. M. Breneman, and K. P. Bennett. 2012. Fast bundle algorithm for multiple-instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 34, 6 (2012), 1068–1079. Google ScholarDigital Library
Hakan Bilen and Andrea Vedaldi. 2015. Weakly supervised deep detection networks. Retrieved from http://arxiv.org/abs/1511.02853.Google Scholar
Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. 2018. NetGAN: Generating graphs via random walks. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 610–619. Retrieved from http://proceedings.mlr.press/v80/bojchevski18a.html.Google Scholar
Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. Retrieved from https://abs/1805.11973.Google Scholar
Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. 2015. Object-proposal evaluation protocol is “gameable.” Retrieved from http://arxiv.org/abs/1505.05836.Google Scholar
Xinlei Chen and Abhinav Gupta. 2017. Spatial memory for context reasoning in object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, 4106–4116. DOI:https://doi.org/10.1109/ICCV.2017.440Google ScholarCross Ref
O. Chum and A. Zisserman. 2007. An exemplar model for learning object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–8.Google Scholar
R. G. Cinbis, J. Verbeek, and C. Schmid. 2017. Weakly supervised object localization with multi-fold multiple-instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1 (2017), 189–203. Google ScholarDigital Library
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Retrieved from http://arxiv.org/abs/1606.09375. Google ScholarDigital Library
Thomas Deselaers, Bogdan Alexe, and Vittorio Ferrari. 2012. Weakly supervised localization and learning with generic knowledge. Int. J. Comput. Vision 100, 3 (Dec. 2012), 275–293. DOI:https://doi.org/10.1007/s11263-012-0538-3 Google ScholarDigital Library
Ali Diba, Vivek Sharma, Ali Mohammad Pazandeh, Hamed Pirsiavash, and Luc Van Gool. 2016. Weakly supervised cascaded convolutional networks. Retrieved from http://arxiv.org/abs/1611.08258.Google Scholar
Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple-instance problem with axis-parallel rectangles. Artif. Intell. 89, 1–2 (Jan. 1997), 31–71. DOI:https://doi.org/10.1016/S0004-3702(96)00034-3 Google ScholarDigital Library
S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, and M. Hebert. 2009. An empirical study of context in object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1271–1278.Google Scholar
Mark Everingham, S. M. Eslami, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vision 111, 1 (Jan. 2015), 98–136. DOI:https://doi.org/10.1007/s11263-014-0733-5 Google ScholarDigital Library
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n.d.]. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html. Google ScholarDigital Library
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n.d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html. Google ScholarDigital Library
Luis Felipe Zeni and Claudio R. Jung. 2020. Distilling knowledge from refinement in multiple-instance detection networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 768–769.Google Scholar
Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional single-shot detector. Retrieved from http://arxiv.org/abs/1701.06659.Google Scholar
Carolina Galleguillos and Serge Belongie. 2010. Context-based object categorization: A critical survey. Comput. Vis. Image Underst. 114, 6 (June 2010), 712–722. DOI:https://doi.org/10.1016/j.cviu.2010.02.004 Google ScholarDigital Library
C. Galleguillos, A. Rabinovich, and S. Belongie. 2008. Object categorization using co-occurrence, location and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–8.Google Scholar
Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, and Larry S. Davis. 2017. C-WSL: Count-guided weakly supervised localization. Retrieved from http://arxiv.org/abs/1711.05282.Google Scholar
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 1263–1272. Google ScholarDigital Library
Ross Girshick. 2015. Fast R-CNN. Retrieved from http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf. Google ScholarDigital Library
Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2013. Rich feature hierarchies for accurate object detection and semantic segmentation. Retrieved from http://arxiv.org/abs/1311.2524. Google ScholarDigital Library
Michel Goossens, S. P. Rahtz, Ross Moore, and Robert S. Sutor. 1999. The Latex Web Companion: Integrating TEX, HTML, and XML (1st ed.). Addison-Wesley Longman Publishing, Boston, MA. Google ScholarDigital Library
K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2980–2988.Google Scholar
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. Retrieved from http://arxiv.org/abs/1703.06870.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. Retrieved from http://arxiv.org/abs/1512.03385.Google Scholar
Jan Hosang, Rodrigo Benenson, and Bernt Schiele. 2014. How good are detection proposals, really? In Proceedings of the British Machine Vision Conference. BMVA Press. DOI:https://doi.org/10.5244/C.28.24Google ScholarCross Ref
Jan Hendrik Hosang, Rodrigo Benenson, Piotr Dollár, and Bernt Schiele. 2015. What makes for effective detection proposals? Retrieved from http://arxiv.org/abs/1502.05082.Google Scholar
Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, and Yichen Wei. 2017. Relation networks for object detection. Retrieved from http://arxiv.org/abs/1711.11575.Google Scholar
Maximilian Ilse, Jakub Tomczak, and Max Welling. 2018. Attention-based deep multiple-instance learning. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 2127–2136. Retrieved from http://proceedings.mlr.press/v80/ilse18a.html.Google Scholar
Ashesh Jain, Amir Roshan Zamir, Silvio Savarese, and Ashutosh Saxena. 2015. Structural-RNN: Deep learning on spatio-temporal graphs. Retrieved from http://dblp.uni-trier.de/db/journals/corr/corr1511.html#JainZSS15.Google Scholar
Ruyi Ji, Dawei Du, Libo Zhang, Longyin Wen, Yanjun Wu, Chen Zhao, Feiyue Huang, and Siwei Lyu. 2019. Learning semantic neural tree for human parsing. Retrieved from http://arxiv.org/abs/1912.09622.Google Scholar
Ruyi Ji, Longyin Wen, Libo Zhang, Dawei Du, Yanjun Wu, Chen Zhao, Xianglong Liu, and Feiyue Huang. 2020. Attention convolutional binary neural tree for fine-grained visual categorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 10465–10474. DOI:https://doi.org/10.1109/CVPR42600.2020.01048Google ScholarCross Ref
Zequn Jie, Yunchao Wei, Xiaojie Jin, Jiashi Feng, and Wei Liu. 2017. Deep self-taught learning for weakly supervised object localization. Retrieved from http://arxiv.org/abs/1704.05188.Google Scholar
Vadim Kantorov, Maxime Oquab, Minsu Cho, and Ivan Laptev. 2016. ContextLocNet: Context-aware deep network models for weakly supervised localization. Retrieved from http://arxiv.org/abs/1609.04331.Google Scholar
Thomas Kipf and Max Welling. 2016. Variational graph auto-encoders. Retrieved from https://abs/1611.07308.Google Scholar
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Retrieved from http://arxiv.org/abs/1609.02907.Google Scholar
Qiuqiang Kong, Changsong Yu, Turab Iqbal, Yong Xu, Wenwu Wang, and Mark D. Plumbley. 2019. Weakly labelled AudioSet classification with attention neural networks. Retrieved from http://arxiv.org/abs/1903.00765. Google ScholarDigital Library
Dimitrios Kotzias, Misha Denil, Nando de Freitas, and Padhraic Smyth. 2015. From group to individual labels using deep features. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). Association for Computing Machinery, New York, NY, 597–606. DOI:https://doi.org/10.1145/2783258.2783380 Google ScholarDigital Library
Xiaoyan Li, Meina Kan, Shiguang Shan, and Xilin Chen. 2019. Weakly supervised object detection with segmentation collaboration. Retrieved from http://arxiv.org/abs/1904.00551.Google Scholar
Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2017. Graph convolutional recurrent neural network: Data-driven traffic forecasting. Retrieved from http://arxiv.org/abs/1707.01926.Google Scholar
Chenhao Lin, Siwen Wang, Dongqi Xu, Yu Lu, and Wayne Zhang. 2020. Object instance mining for weakly supervised object detection. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20), the 32nd Innovative Applications of Artificial Intelligence Conference (IAAI’20), and the 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’20). AAAI Press, 11482–11489. Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/6813.Google Scholar
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2015. SSD: Single-shot MultiBox Detector. Retrieved from https://arxiv:1512.02325. DOI:https://doi.org/10.1007/978-3-319-46448-0_2Google ScholarCross Ref
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2015. SSD: Single-shot MultiBox detector. Retrieved from http://arxiv.org/abs/1512.02325.Google Scholar
Xinhai Liu, Zhizhong Han, Yu-Shen Liu, and Matthias Zwicker. 2019. Point2Sequence: Learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19), and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’19). AAAI Press, 8778–8785. DOI:https://doi.org/10.1609/aaai.v33i01.33018778Google ScholarDigital Library
Oded Maron and Tomás Lozano-Pérez. 1998. A framework for multiple-instance learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems 10 (NIPS’97). MIT Press, Cambridge, MA, 570–576. Google ScholarDigital Library
M. Oquab, L. Bottou, I. Laptev, and J. Sivic. 2015. Is object localization for free? - Weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 685–694.Google Scholar
Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. 2018. Adversarially regularized graph autoencoder for graph embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 2609–2615. Google ScholarDigital Library
M. Pandey and S. Lazebnik. 2011. Scene recognition and weakly supervised object localization with deformable part-based models. In Proceedings of the International Conference on Computer Vision. 1307–1314. Google ScholarDigital Library
Nikolaos Pappas and Andrei Popescu-Belis. 2014. Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 455–466. DOI:https://doi.org/10.3115/v1/D14-1052Google ScholarCross Ref
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, 8026–8037. Retrieved from http://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf. Google ScholarDigital Library
Minlong Peng and Qi Zhang. 2019. Address instance-level label prediction in multiple-instance learning. Retrieved from http://arxiv.org/abs/1905.12226.Google Scholar
Pedro H. O. Pinheiro and Ronan Collobert. 2014. Weakly supervised semantic segmentation with convolutional networks. Retrieved from http://arxiv.org/abs/1411.6228.Google Scholar
Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. Retrieved from http://arxiv.org/abs/1506.02640.Google Scholar
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). MIT Press, Cambridge, MA, 91–99. Google ScholarDigital Library
W. Ren, K. Huang, D. Tao, and T. Tan. 2016. Weakly supervised large scale object localization with multiple-instance learning and bag splitting. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2 (2016), 405–416. Google ScholarDigital Library
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2014. ImageNet large scale visual recognition challenge. Retrieved from http://arxiv.org/abs/1409.0575. Google ScholarDigital Library
F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. 2009. The graph neural network model. IEEE Trans. Neural Netw. 20, 1 (2009), 61–80. Google ScholarDigital Library
Y. Shen, R. Ji, Y. Wang, Y. Wu, and L. Cao. 2019. Cyclic guidance for weakly supervised joint detection and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 697–707.Google Scholar
Y. Shen, R. Ji, S. Zhang, W. Zuo, and Y. Wang. 2018. Generative adversarial learning towards fast weakly supervised detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5764–5773.Google Scholar
M. Shi, H. Caesar, and V. Ferrari. 2017. Weakly supervised object localization using things and stuff transfer. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3401–3410.Google Scholar
Z. Shi, T. M. Hospedales, and T. Xiang. 2015. Bayesian joint modelling for object localisation in weakly labelled images. IEEE Trans. Pattern Anal. Mach. Intell. 37, 10 (2015), 1959–1972. Google ScholarDigital Library
Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’06), Aleš Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–15. Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
Krishna Kumar Singh and Yong Jae Lee. 2019. You reap what you sow: Using videos to generate high precision object proposals for weakly-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, and Trevor Darrell. 2014. Weakly-supervised discovery of visual pattern configurations. Retrieved from http://arxiv.org/abs/1406.6507.Google ScholarDigital Library
Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, and Alan L. Yuille. 2018. PCL: Proposal cluster learning for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1807.03342.Google Scholar
Peng Tang, Xinggang Wang, Xiang Bai, and Wenyu Liu. 2017. Multiple-instance detection network with online instance classifier refinement. Retrieved from http://arxiv.org/abs/1704.00138.Google Scholar
Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, and Wenyu Liu. 2017. Deep patch learning for weakly supervised object classification and discovery. Retrieved from http://arxiv.org/abs/1705.02429.Google Scholar
Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan, Wenyu Liu, Junzhou Huang, and Alan Yuille. 2018. Weakly supervised region proposal network and object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 370–386. Google ScholarDigital Library
Torralba, Murphy, Freeman, and Rubin. 2003. Context-based vision system for place and object recognition. In Proceedings of the 9th IEEE International Conference on Computer Vision, Vol. 1. 273–280. Google ScholarDigital Library
Z. Tu and X. Bai. 2010. Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 32, 10 (2010), 1744–1757. Google ScholarDigital Library
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. Int. J. Comput. Vision 104, 2 (2013), 154--171. https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013. Google ScholarDigital Library
Rianne van den Berg, Thomas Kipf, and Max Welling. 2017. Graph convolutional matrix completion. Retrieved from https://abs/1706.02263.Google Scholar
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=rJXMpikCZ.Google Scholar
Fang Wan, Chang Liu, Wei Ke, Xiangyang Ji, Jianbin Jiao, and Qixiang Ye. 2019. C-MIL: Continuation multiple-instance learning for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1904.05647.Google Scholar
Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, and Qixiang Ye. 2019. Min-entropy latent model for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1902.06057.Google Scholar
Chong Wang, Weiqiang Ren, Kaiqi Huang, and Tieniu Tan. 2014. Weakly supervised object localization with latent category learning. In Proceedings of the European Conference on Computer Vision (ECCV’14), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 431–445. Google ScholarCross Ref
Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, and Dahua Lin. 2019. Region proposal by guided anchoring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. 2017. Non-local neural networks. Retrieved from http://arxiv.org/abs/1711.07971.Google Scholar
Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi, Jinjun Xiong, Jiashi Feng, and Thomas S. Huang. 2018. TS2C: Tight box mining with surrounding segmentation context for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1807.04897.Google Scholar
Jiajun Wu, Yinan Yu, Chang Huang, and Kai Yu. 2015. Deep multiple-instance learning for image classification and auto-annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15). IEEE Computer Society, 3460--3469. DOI:10.1109/CVPR.2015.7298968Google ScholarCross Ref
Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. 2018. GraphRNN: Generating realistic graphs with deep auto-regressive models. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80, 5708–5717. Retrieved from http://proceedings.mlr.press/v80/you18a.html.Google Scholar
Jiani Zhang, Xingjian Shi, Junyuan Xie, Hao Ma, Irwin King, and Dit-Yan Yeung. 2018. GaAN: Gated attention networks for learning on large and spatiotemporal graphs. Retrieved from http://arxiv.org/abs/1803.07294.Google Scholar
Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z. Li. 2017. Single-shot refinement neural network for object detection. Retrieved from http://arxiv.org/abs/1711.06897.Google Scholar
Xiaopeng Zhang, Jiashi Feng, Hongkai Xiong, and Qi Tian. 2018. Zigzag learning for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1804.09466.Google Scholar
Y. Zhang, Y. Bai, M. Ding, Y. Li, and B. Ghanem. 2018. W2F: A weakly-supervised to fully-supervised framework for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 928–936.Google Scholar
Zhi-Hua Zhou, Yu-Yin Sun, and Yu-Feng Li. 2008. Multi-instance learning by treating instances as non-I.I.D. samples. Retrieved from http://arxiv.org/abs/0807.1997. Google ScholarDigital Library
Zhi-Hua Zhou and Min-Ling Zhang. 2002. Neural networks for multi-instance learning. Proceedings of the International Conference on Intelligent Information Technology.Google Scholar
Larry Zitnick and Piotr Dollar. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision (ECCV’14). Retrieved from https://www.microsoft.com/en-us/research/publication/edge-boxes-locating-object-proposals-from-edges/.Google ScholarCross Ref

Index Terms

Multi-peak Graph-based Multi-instance Learning for Weakly Supervised Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Instance-based learning

Index terms have been assigned to the content through auto-classification.

Recommendations

A Dual-Network Progressive Approach to Weakly Supervised Object Detection
MM '17: Proceedings of the 25th ACM international conference on Multimedia

A major challenge that arises in Weakly Supervised Object Detection (WSOD) is that only image-level labels are available, whereas WSOD trains instance-level object detectors. A typical approach to WSOD is to 1) generate a series of region proposals for ...
Read More
Cost‐effective multi‐instance multilabel active learning
Abstract
Multi‐instance multi‐label (MIML) Active Learning (M2AL) aims to improve the learner while reducing the cost as much as possible by querying informative labels of complex bags composed of diverse instances. Existing M2AL solutions suffer high ...
Read More
Diversified dictionaries for multi-instance learning

Multiple-instance learning (MIL) has been a popular topic in the study of pattern recognition for years due to its usefulness for such tasks as drug activity prediction and image/text classification. In a typical MIL setting, a bag contains a bag-level ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 2s
June 2021
349 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3465440
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2021
- Revised: 1 October 2020
- Accepted: 1 October 2020
- Received: 1 July 2020
Published in tomm Volume 17, Issue 2s

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Weakly supervised object detection
multi-instance learning
context information
graph neural network
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 344
  Total Downloads
- Downloads (Last 12 months)87
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Multi-peak Graph-based Multi-instance Learning for Weakly Supervised Object Detection

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

A Dual-Network Progressive Approach to Weakly Supervised Object Detection

Cost‐effective multi‐instance multilabel active learning

Diversified dictionaries for multi-instance learning