Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open Access

Multi-peak Graph-based Multi-instance Learning for Weakly Supervised Object Detection

Authors Info & Claims
Published:14 June 2021Publication History
Skip Abstract Section

Abstract

Weakly supervised object detection (WSOD), aiming to detect objects with only image-level annotations, has become one of the research hotspots over the past few years. Recently, much effort has been devoted to WSOD for the simple yet effective architecture and remarkable improvements have been achieved. Existing approaches using multiple-instance learning usually pay more attention to the proposals individually, ignoring relation information between proposals. Besides, to obtain pseudo-ground-truth boxes for WSOD, MIL-based methods tend to select the region with the highest confidence score and regard those with small overlap as background category, which leads to mislabeled instances. As a result, these methods suffer from mislabeling instances and lacking relations between proposals, degrading the performance of WSOD. To tackle these issues, this article introduces a multi-peak graph-based model for WSOD. Specifically, we use the instance graph to model the relations between proposals, which reinforces multiple-instance learning process. In addition, a multi-peak discovery strategy is designed to avert mislabeling instances. The proposed model is trained by stochastic gradients decent optimizer using back-propagation in an end-to-end manner. Extensive quantitative and qualitative evaluations on two publicly challenging benchmarks, PASCAL VOC 2007 and PASCAL VOC 2012, demonstrate the superiority and effectiveness of the proposed approach.

References

  1. Stefanos Angelidis and Mirella Lapata. 2018. Multiple-instance learning networks for fine-grained sentiment analysis. Trans. Assoc. Comput. Linguist. 6 (2018), 17–31. DOI:https://doi.org/10.1162/tacl_a_00002Google ScholarGoogle ScholarCross RefCross Ref
  2. Aditya Arun, C. V. Jawahar, and M. Pawan Kumar. 2018. Dissimilarity coefficient-based weakly supervised object detection. Retrieved from http://arxiv.org/abs/1811.10016.Google ScholarGoogle Scholar
  3. C. Bergeron, G. Moore, J. Zaretzki, C. M. Breneman, and K. P. Bennett. 2012. Fast bundle algorithm for multiple-instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 34, 6 (2012), 1068–1079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hakan Bilen and Andrea Vedaldi. 2015. Weakly supervised deep detection networks. Retrieved from http://arxiv.org/abs/1511.02853.Google ScholarGoogle Scholar
  5. Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. 2018. NetGAN: Generating graphs via random walks. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 610–619. Retrieved from http://proceedings.mlr.press/v80/bojchevski18a.html.Google ScholarGoogle Scholar
  6. Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. Retrieved from https://abs/1805.11973.Google ScholarGoogle Scholar
  7. Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. 2015. Object-proposal evaluation protocol is “gameable.” Retrieved from http://arxiv.org/abs/1505.05836.Google ScholarGoogle Scholar
  8. Xinlei Chen and Abhinav Gupta. 2017. Spatial memory for context reasoning in object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, 4106–4116. DOI:https://doi.org/10.1109/ICCV.2017.440Google ScholarGoogle ScholarCross RefCross Ref
  9. O. Chum and A. Zisserman. 2007. An exemplar model for learning object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–8.Google ScholarGoogle Scholar
  10. R. G. Cinbis, J. Verbeek, and C. Schmid. 2017. Weakly supervised object localization with multi-fold multiple-instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1 (2017), 189–203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Retrieved from http://arxiv.org/abs/1606.09375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Thomas Deselaers, Bogdan Alexe, and Vittorio Ferrari. 2012. Weakly supervised localization and learning with generic knowledge. Int. J. Comput. Vision 100, 3 (Dec. 2012), 275–293. DOI:https://doi.org/10.1007/s11263-012-0538-3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ali Diba, Vivek Sharma, Ali Mohammad Pazandeh, Hamed Pirsiavash, and Luc Van Gool. 2016. Weakly supervised cascaded convolutional networks. Retrieved from http://arxiv.org/abs/1611.08258.Google ScholarGoogle Scholar
  14. Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple-instance problem with axis-parallel rectangles. Artif. Intell. 89, 1–2 (Jan. 1997), 31–71. DOI:https://doi.org/10.1016/S0004-3702(96)00034-3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, and M. Hebert. 2009. An empirical study of context in object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1271–1278.Google ScholarGoogle Scholar
  16. Mark Everingham, S. M. Eslami, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vision 111, 1 (Jan. 2015), 98–136. DOI:https://doi.org/10.1007/s11263-014-0733-5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n.d.]. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n.d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Luis Felipe Zeni and Claudio R. Jung. 2020. Distilling knowledge from refinement in multiple-instance detection networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 768–769.Google ScholarGoogle Scholar
  20. Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional single-shot detector. Retrieved from http://arxiv.org/abs/1701.06659.Google ScholarGoogle Scholar
  21. Carolina Galleguillos and Serge Belongie. 2010. Context-based object categorization: A critical survey. Comput. Vis. Image Underst. 114, 6 (June 2010), 712–722. DOI:https://doi.org/10.1016/j.cviu.2010.02.004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Galleguillos, A. Rabinovich, and S. Belongie. 2008. Object categorization using co-occurrence, location and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–8.Google ScholarGoogle Scholar
  23. Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, and Larry S. Davis. 2017. C-WSL: Count-guided weakly supervised localization. Retrieved from http://arxiv.org/abs/1711.05282.Google ScholarGoogle Scholar
  24. Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 1263–1272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ross Girshick. 2015. Fast R-CNN. Retrieved from http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2013. Rich feature hierarchies for accurate object detection and semantic segmentation. Retrieved from http://arxiv.org/abs/1311.2524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Michel Goossens, S. P. Rahtz, Ross Moore, and Robert S. Sutor. 1999. The Latex Web Companion: Integrating TEX, HTML, and XML (1st ed.). Addison-Wesley Longman Publishing, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2980–2988.Google ScholarGoogle Scholar
  29. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. Retrieved from http://arxiv.org/abs/1703.06870.Google ScholarGoogle Scholar
  30. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. Retrieved from http://arxiv.org/abs/1512.03385.Google ScholarGoogle Scholar
  31. Jan Hosang, Rodrigo Benenson, and Bernt Schiele. 2014. How good are detection proposals, really? In Proceedings of the British Machine Vision Conference. BMVA Press. DOI:https://doi.org/10.5244/C.28.24Google ScholarGoogle ScholarCross RefCross Ref
  32. Jan Hendrik Hosang, Rodrigo Benenson, Piotr Dollár, and Bernt Schiele. 2015. What makes for effective detection proposals? Retrieved from http://arxiv.org/abs/1502.05082.Google ScholarGoogle Scholar
  33. Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, and Yichen Wei. 2017. Relation networks for object detection. Retrieved from http://arxiv.org/abs/1711.11575.Google ScholarGoogle Scholar
  34. Maximilian Ilse, Jakub Tomczak, and Max Welling. 2018. Attention-based deep multiple-instance learning. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 2127–2136. Retrieved from http://proceedings.mlr.press/v80/ilse18a.html.Google ScholarGoogle Scholar
  35. Ashesh Jain, Amir Roshan Zamir, Silvio Savarese, and Ashutosh Saxena. 2015. Structural-RNN: Deep learning on spatio-temporal graphs. Retrieved from http://dblp.uni-trier.de/db/journals/corr/corr1511.html#JainZSS15.Google ScholarGoogle Scholar
  36. Ruyi Ji, Dawei Du, Libo Zhang, Longyin Wen, Yanjun Wu, Chen Zhao, Feiyue Huang, and Siwei Lyu. 2019. Learning semantic neural tree for human parsing. Retrieved from http://arxiv.org/abs/1912.09622.Google ScholarGoogle Scholar
  37. Ruyi Ji, Longyin Wen, Libo Zhang, Dawei Du, Yanjun Wu, Chen Zhao, Xianglong Liu, and Feiyue Huang. 2020. Attention convolutional binary neural tree for fine-grained visual categorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 10465–10474. DOI:https://doi.org/10.1109/CVPR42600.2020.01048Google ScholarGoogle ScholarCross RefCross Ref
  38. Zequn Jie, Yunchao Wei, Xiaojie Jin, Jiashi Feng, and Wei Liu. 2017. Deep self-taught learning for weakly supervised object localization. Retrieved from http://arxiv.org/abs/1704.05188.Google ScholarGoogle Scholar
  39. Vadim Kantorov, Maxime Oquab, Minsu Cho, and Ivan Laptev. 2016. ContextLocNet: Context-aware deep network models for weakly supervised localization. Retrieved from http://arxiv.org/abs/1609.04331.Google ScholarGoogle Scholar
  40. Thomas Kipf and Max Welling. 2016. Variational graph auto-encoders. Retrieved from https://abs/1611.07308.Google ScholarGoogle Scholar
  41. Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Retrieved from http://arxiv.org/abs/1609.02907.Google ScholarGoogle Scholar
  42. Qiuqiang Kong, Changsong Yu, Turab Iqbal, Yong Xu, Wenwu Wang, and Mark D. Plumbley. 2019. Weakly labelled AudioSet classification with attention neural networks. Retrieved from http://arxiv.org/abs/1903.00765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Dimitrios Kotzias, Misha Denil, Nando de Freitas, and Padhraic Smyth. 2015. From group to individual labels using deep features. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). Association for Computing Machinery, New York, NY, 597–606. DOI:https://doi.org/10.1145/2783258.2783380 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xiaoyan Li, Meina Kan, Shiguang Shan, and Xilin Chen. 2019. Weakly supervised object detection with segmentation collaboration. Retrieved from http://arxiv.org/abs/1904.00551.Google ScholarGoogle Scholar
  45. Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2017. Graph convolutional recurrent neural network: Data-driven traffic forecasting. Retrieved from http://arxiv.org/abs/1707.01926.Google ScholarGoogle Scholar
  46. Chenhao Lin, Siwen Wang, Dongqi Xu, Yu Lu, and Wayne Zhang. 2020. Object instance mining for weakly supervised object detection. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20), the 32nd Innovative Applications of Artificial Intelligence Conference (IAAI’20), and the 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’20). AAAI Press, 11482–11489. Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/6813.Google ScholarGoogle Scholar
  47. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2015. SSD: Single-shot MultiBox Detector. Retrieved from https://arxiv:1512.02325. DOI:https://doi.org/10.1007/978-3-319-46448-0_2Google ScholarGoogle ScholarCross RefCross Ref
  48. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2015. SSD: Single-shot MultiBox detector. Retrieved from http://arxiv.org/abs/1512.02325.Google ScholarGoogle Scholar
  49. Xinhai Liu, Zhizhong Han, Yu-Shen Liu, and Matthias Zwicker. 2019. Point2Sequence: Learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19), and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’19). AAAI Press, 8778–8785. DOI:https://doi.org/10.1609/aaai.v33i01.33018778Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Oded Maron and Tomás Lozano-Pérez. 1998. A framework for multiple-instance learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems 10 (NIPS’97). MIT Press, Cambridge, MA, 570–576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. M. Oquab, L. Bottou, I. Laptev, and J. Sivic. 2015. Is object localization for free? - Weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 685–694.Google ScholarGoogle Scholar
  52. Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. 2018. Adversarially regularized graph autoencoder for graph embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 2609–2615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Pandey and S. Lazebnik. 2011. Scene recognition and weakly supervised object localization with deformable part-based models. In Proceedings of the International Conference on Computer Vision. 1307–1314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Nikolaos Pappas and Andrei Popescu-Belis. 2014. Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 455–466. DOI:https://doi.org/10.3115/v1/D14-1052Google ScholarGoogle ScholarCross RefCross Ref
  55. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, 8026–8037. Retrieved from http://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Minlong Peng and Qi Zhang. 2019. Address instance-level label prediction in multiple-instance learning. Retrieved from http://arxiv.org/abs/1905.12226.Google ScholarGoogle Scholar
  57. Pedro H. O. Pinheiro and Ronan Collobert. 2014. Weakly supervised semantic segmentation with convolutional networks. Retrieved from http://arxiv.org/abs/1411.6228.Google ScholarGoogle Scholar
  58. Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. Retrieved from http://arxiv.org/abs/1506.02640.Google ScholarGoogle Scholar
  59. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). MIT Press, Cambridge, MA, 91–99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. W. Ren, K. Huang, D. Tao, and T. Tan. 2016. Weakly supervised large scale object localization with multiple-instance learning and bag splitting. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2 (2016), 405–416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2014. ImageNet large scale visual recognition challenge. Retrieved from http://arxiv.org/abs/1409.0575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. 2009. The graph neural network model. IEEE Trans. Neural Netw. 20, 1 (2009), 61–80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Y. Shen, R. Ji, Y. Wang, Y. Wu, and L. Cao. 2019. Cyclic guidance for weakly supervised joint detection and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 697–707.Google ScholarGoogle Scholar
  64. Y. Shen, R. Ji, S. Zhang, W. Zuo, and Y. Wang. 2018. Generative adversarial learning towards fast weakly supervised detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5764–5773.Google ScholarGoogle Scholar
  65. M. Shi, H. Caesar, and V. Ferrari. 2017. Weakly supervised object localization using things and stuff transfer. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3401–3410.Google ScholarGoogle Scholar
  66. Z. Shi, T. M. Hospedales, and T. Xiang. 2015. Bayesian joint modelling for object localisation in weakly labelled images. IEEE Trans. Pattern Anal. Mach. Intell. 37, 10 (2015), 1959–1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’06), Aleš Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  69. Krishna Kumar Singh and Yong Jae Lee. 2019. You reap what you sow: Using videos to generate high precision object proposals for weakly-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  70. Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, and Trevor Darrell. 2014. Weakly-supervised discovery of visual pattern configurations. Retrieved from http://arxiv.org/abs/1406.6507.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, and Alan L. Yuille. 2018. PCL: Proposal cluster learning for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1807.03342.Google ScholarGoogle Scholar
  72. Peng Tang, Xinggang Wang, Xiang Bai, and Wenyu Liu. 2017. Multiple-instance detection network with online instance classifier refinement. Retrieved from http://arxiv.org/abs/1704.00138.Google ScholarGoogle Scholar
  73. Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, and Wenyu Liu. 2017. Deep patch learning for weakly supervised object classification and discovery. Retrieved from http://arxiv.org/abs/1705.02429.Google ScholarGoogle Scholar
  74. Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan, Wenyu Liu, Junzhou Huang, and Alan Yuille. 2018. Weakly supervised region proposal network and object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 370–386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Torralba, Murphy, Freeman, and Rubin. 2003. Context-based vision system for place and object recognition. In Proceedings of the 9th IEEE International Conference on Computer Vision, Vol. 1. 273–280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Z. Tu and X. Bai. 2010. Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 32, 10 (2010), 1744–1757. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. Int. J. Comput. Vision 104, 2 (2013), 154--171. https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Rianne van den Berg, Thomas Kipf, and Max Welling. 2017. Graph convolutional matrix completion. Retrieved from https://abs/1706.02263.Google ScholarGoogle Scholar
  79. Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=rJXMpikCZ.Google ScholarGoogle Scholar
  80. Fang Wan, Chang Liu, Wei Ke, Xiangyang Ji, Jianbin Jiao, and Qixiang Ye. 2019. C-MIL: Continuation multiple-instance learning for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1904.05647.Google ScholarGoogle Scholar
  81. Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, and Qixiang Ye. 2019. Min-entropy latent model for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1902.06057.Google ScholarGoogle Scholar
  82. Chong Wang, Weiqiang Ren, Kaiqi Huang, and Tieniu Tan. 2014. Weakly supervised object localization with latent category learning. In Proceedings of the European Conference on Computer Vision (ECCV’14), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 431–445. Google ScholarGoogle ScholarCross RefCross Ref
  83. Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, and Dahua Lin. 2019. Region proposal by guided anchoring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  84. Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. 2017. Non-local neural networks. Retrieved from http://arxiv.org/abs/1711.07971.Google ScholarGoogle Scholar
  85. Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi, Jinjun Xiong, Jiashi Feng, and Thomas S. Huang. 2018. TS2C: Tight box mining with surrounding segmentation context for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1807.04897.Google ScholarGoogle Scholar
  86. Jiajun Wu, Yinan Yu, Chang Huang, and Kai Yu. 2015. Deep multiple-instance learning for image classification and auto-annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15). IEEE Computer Society, 3460--3469. DOI:10.1109/CVPR.2015.7298968Google ScholarGoogle ScholarCross RefCross Ref
  87. Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. 2018. GraphRNN: Generating realistic graphs with deep auto-regressive models. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80, 5708–5717. Retrieved from http://proceedings.mlr.press/v80/you18a.html.Google ScholarGoogle Scholar
  88. Jiani Zhang, Xingjian Shi, Junyuan Xie, Hao Ma, Irwin King, and Dit-Yan Yeung. 2018. GaAN: Gated attention networks for learning on large and spatiotemporal graphs. Retrieved from http://arxiv.org/abs/1803.07294.Google ScholarGoogle Scholar
  89. Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z. Li. 2017. Single-shot refinement neural network for object detection. Retrieved from http://arxiv.org/abs/1711.06897.Google ScholarGoogle Scholar
  90. Xiaopeng Zhang, Jiashi Feng, Hongkai Xiong, and Qi Tian. 2018. Zigzag learning for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1804.09466.Google ScholarGoogle Scholar
  91. Y. Zhang, Y. Bai, M. Ding, Y. Li, and B. Ghanem. 2018. W2F: A weakly-supervised to fully-supervised framework for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 928–936.Google ScholarGoogle Scholar
  92. Zhi-Hua Zhou, Yu-Yin Sun, and Yu-Feng Li. 2008. Multi-instance learning by treating instances as non-I.I.D. samples. Retrieved from http://arxiv.org/abs/0807.1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Zhi-Hua Zhou and Min-Ling Zhang. 2002. Neural networks for multi-instance learning. Proceedings of the International Conference on Intelligent Information Technology.Google ScholarGoogle Scholar
  94. Larry Zitnick and Piotr Dollar. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision (ECCV’14). Retrieved from https://www.microsoft.com/en-us/research/publication/edge-boxes-locating-object-proposals-from-edges/.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-peak Graph-based Multi-instance Learning for Weakly Supervised Object Detection
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2s
          June 2021
          349 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3465440
          Issue’s Table of Contents

          Copyright © 2021 Association for Computing Machinery.

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2021
          • Revised: 1 October 2020
          • Accepted: 1 October 2020
          • Received: 1 July 2020
          Published in tomm Volume 17, Issue 2s

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format