Abstract
Weakly supervised object detection (WSOD), aiming to detect objects with only image-level annotations, has become one of the research hotspots over the past few years. Recently, much effort has been devoted to WSOD for the simple yet effective architecture and remarkable improvements have been achieved. Existing approaches using multiple-instance learning usually pay more attention to the proposals individually, ignoring relation information between proposals. Besides, to obtain pseudo-ground-truth boxes for WSOD, MIL-based methods tend to select the region with the highest confidence score and regard those with small overlap as background category, which leads to mislabeled instances. As a result, these methods suffer from mislabeling instances and lacking relations between proposals, degrading the performance of WSOD. To tackle these issues, this article introduces a multi-peak graph-based model for WSOD. Specifically, we use the instance graph to model the relations between proposals, which reinforces multiple-instance learning process. In addition, a multi-peak discovery strategy is designed to avert mislabeling instances. The proposed model is trained by stochastic gradients decent optimizer using back-propagation in an end-to-end manner. Extensive quantitative and qualitative evaluations on two publicly challenging benchmarks, PASCAL VOC 2007 and PASCAL VOC 2012, demonstrate the superiority and effectiveness of the proposed approach.
- Stefanos Angelidis and Mirella Lapata. 2018. Multiple-instance learning networks for fine-grained sentiment analysis. Trans. Assoc. Comput. Linguist. 6 (2018), 17–31. DOI:https://doi.org/10.1162/tacl_a_00002Google ScholarCross Ref
- Aditya Arun, C. V. Jawahar, and M. Pawan Kumar. 2018. Dissimilarity coefficient-based weakly supervised object detection. Retrieved from http://arxiv.org/abs/1811.10016.Google Scholar
- C. Bergeron, G. Moore, J. Zaretzki, C. M. Breneman, and K. P. Bennett. 2012. Fast bundle algorithm for multiple-instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 34, 6 (2012), 1068–1079. Google ScholarDigital Library
- Hakan Bilen and Andrea Vedaldi. 2015. Weakly supervised deep detection networks. Retrieved from http://arxiv.org/abs/1511.02853.Google Scholar
- Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. 2018. NetGAN: Generating graphs via random walks. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 610–619. Retrieved from http://proceedings.mlr.press/v80/bojchevski18a.html.Google Scholar
- Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. Retrieved from https://abs/1805.11973.Google Scholar
- Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. 2015. Object-proposal evaluation protocol is “gameable.” Retrieved from http://arxiv.org/abs/1505.05836.Google Scholar
- Xinlei Chen and Abhinav Gupta. 2017. Spatial memory for context reasoning in object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, 4106–4116. DOI:https://doi.org/10.1109/ICCV.2017.440Google ScholarCross Ref
- O. Chum and A. Zisserman. 2007. An exemplar model for learning object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–8.Google Scholar
- R. G. Cinbis, J. Verbeek, and C. Schmid. 2017. Weakly supervised object localization with multi-fold multiple-instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1 (2017), 189–203. Google ScholarDigital Library
- Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Retrieved from http://arxiv.org/abs/1606.09375. Google ScholarDigital Library
- Thomas Deselaers, Bogdan Alexe, and Vittorio Ferrari. 2012. Weakly supervised localization and learning with generic knowledge. Int. J. Comput. Vision 100, 3 (Dec. 2012), 275–293. DOI:https://doi.org/10.1007/s11263-012-0538-3 Google ScholarDigital Library
- Ali Diba, Vivek Sharma, Ali Mohammad Pazandeh, Hamed Pirsiavash, and Luc Van Gool. 2016. Weakly supervised cascaded convolutional networks. Retrieved from http://arxiv.org/abs/1611.08258.Google Scholar
- Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple-instance problem with axis-parallel rectangles. Artif. Intell. 89, 1–2 (Jan. 1997), 31–71. DOI:https://doi.org/10.1016/S0004-3702(96)00034-3 Google ScholarDigital Library
- S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, and M. Hebert. 2009. An empirical study of context in object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1271–1278.Google Scholar
- Mark Everingham, S. M. Eslami, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vision 111, 1 (Jan. 2015), 98–136. DOI:https://doi.org/10.1007/s11263-014-0733-5 Google ScholarDigital Library
- M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n.d.]. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html. Google ScholarDigital Library
- M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n.d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html. Google ScholarDigital Library
- Luis Felipe Zeni and Claudio R. Jung. 2020. Distilling knowledge from refinement in multiple-instance detection networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 768–769.Google Scholar
- Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional single-shot detector. Retrieved from http://arxiv.org/abs/1701.06659.Google Scholar
- Carolina Galleguillos and Serge Belongie. 2010. Context-based object categorization: A critical survey. Comput. Vis. Image Underst. 114, 6 (June 2010), 712–722. DOI:https://doi.org/10.1016/j.cviu.2010.02.004 Google ScholarDigital Library
- C. Galleguillos, A. Rabinovich, and S. Belongie. 2008. Object categorization using co-occurrence, location and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–8.Google Scholar
- Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, and Larry S. Davis. 2017. C-WSL: Count-guided weakly supervised localization. Retrieved from http://arxiv.org/abs/1711.05282.Google Scholar
- Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 1263–1272. Google ScholarDigital Library
- Ross Girshick. 2015. Fast R-CNN. Retrieved from http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf. Google ScholarDigital Library
- Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2013. Rich feature hierarchies for accurate object detection and semantic segmentation. Retrieved from http://arxiv.org/abs/1311.2524. Google ScholarDigital Library
- Michel Goossens, S. P. Rahtz, Ross Moore, and Robert S. Sutor. 1999. The Latex Web Companion: Integrating TEX, HTML, and XML (1st ed.). Addison-Wesley Longman Publishing, Boston, MA. Google ScholarDigital Library
- K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2980–2988.Google Scholar
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. Retrieved from http://arxiv.org/abs/1703.06870.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. Retrieved from http://arxiv.org/abs/1512.03385.Google Scholar
- Jan Hosang, Rodrigo Benenson, and Bernt Schiele. 2014. How good are detection proposals, really? In Proceedings of the British Machine Vision Conference. BMVA Press. DOI:https://doi.org/10.5244/C.28.24Google ScholarCross Ref
- Jan Hendrik Hosang, Rodrigo Benenson, Piotr Dollár, and Bernt Schiele. 2015. What makes for effective detection proposals? Retrieved from http://arxiv.org/abs/1502.05082.Google Scholar
- Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, and Yichen Wei. 2017. Relation networks for object detection. Retrieved from http://arxiv.org/abs/1711.11575.Google Scholar
- Maximilian Ilse, Jakub Tomczak, and Max Welling. 2018. Attention-based deep multiple-instance learning. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 2127–2136. Retrieved from http://proceedings.mlr.press/v80/ilse18a.html.Google Scholar
- Ashesh Jain, Amir Roshan Zamir, Silvio Savarese, and Ashutosh Saxena. 2015. Structural-RNN: Deep learning on spatio-temporal graphs. Retrieved from http://dblp.uni-trier.de/db/journals/corr/corr1511.html#JainZSS15.Google Scholar
- Ruyi Ji, Dawei Du, Libo Zhang, Longyin Wen, Yanjun Wu, Chen Zhao, Feiyue Huang, and Siwei Lyu. 2019. Learning semantic neural tree for human parsing. Retrieved from http://arxiv.org/abs/1912.09622.Google Scholar
- Ruyi Ji, Longyin Wen, Libo Zhang, Dawei Du, Yanjun Wu, Chen Zhao, Xianglong Liu, and Feiyue Huang. 2020. Attention convolutional binary neural tree for fine-grained visual categorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 10465–10474. DOI:https://doi.org/10.1109/CVPR42600.2020.01048Google ScholarCross Ref
- Zequn Jie, Yunchao Wei, Xiaojie Jin, Jiashi Feng, and Wei Liu. 2017. Deep self-taught learning for weakly supervised object localization. Retrieved from http://arxiv.org/abs/1704.05188.Google Scholar
- Vadim Kantorov, Maxime Oquab, Minsu Cho, and Ivan Laptev. 2016. ContextLocNet: Context-aware deep network models for weakly supervised localization. Retrieved from http://arxiv.org/abs/1609.04331.Google Scholar
- Thomas Kipf and Max Welling. 2016. Variational graph auto-encoders. Retrieved from https://abs/1611.07308.Google Scholar
- Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Retrieved from http://arxiv.org/abs/1609.02907.Google Scholar
- Qiuqiang Kong, Changsong Yu, Turab Iqbal, Yong Xu, Wenwu Wang, and Mark D. Plumbley. 2019. Weakly labelled AudioSet classification with attention neural networks. Retrieved from http://arxiv.org/abs/1903.00765. Google ScholarDigital Library
- Dimitrios Kotzias, Misha Denil, Nando de Freitas, and Padhraic Smyth. 2015. From group to individual labels using deep features. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). Association for Computing Machinery, New York, NY, 597–606. DOI:https://doi.org/10.1145/2783258.2783380 Google ScholarDigital Library
- Xiaoyan Li, Meina Kan, Shiguang Shan, and Xilin Chen. 2019. Weakly supervised object detection with segmentation collaboration. Retrieved from http://arxiv.org/abs/1904.00551.Google Scholar
- Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2017. Graph convolutional recurrent neural network: Data-driven traffic forecasting. Retrieved from http://arxiv.org/abs/1707.01926.Google Scholar
- Chenhao Lin, Siwen Wang, Dongqi Xu, Yu Lu, and Wayne Zhang. 2020. Object instance mining for weakly supervised object detection. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20), the 32nd Innovative Applications of Artificial Intelligence Conference (IAAI’20), and the 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’20). AAAI Press, 11482–11489. Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/6813.Google Scholar
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2015. SSD: Single-shot MultiBox Detector. Retrieved from https://arxiv:1512.02325. DOI:https://doi.org/10.1007/978-3-319-46448-0_2Google ScholarCross Ref
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2015. SSD: Single-shot MultiBox detector. Retrieved from http://arxiv.org/abs/1512.02325.Google Scholar
- Xinhai Liu, Zhizhong Han, Yu-Shen Liu, and Matthias Zwicker. 2019. Point2Sequence: Learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), the 31st Innovative Applications of Artificial Intelligence Conference (IAAI’19), and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’19). AAAI Press, 8778–8785. DOI:https://doi.org/10.1609/aaai.v33i01.33018778Google ScholarDigital Library
- Oded Maron and Tomás Lozano-Pérez. 1998. A framework for multiple-instance learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems 10 (NIPS’97). MIT Press, Cambridge, MA, 570–576. Google ScholarDigital Library
- M. Oquab, L. Bottou, I. Laptev, and J. Sivic. 2015. Is object localization for free? - Weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 685–694.Google Scholar
- Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. 2018. Adversarially regularized graph autoencoder for graph embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 2609–2615. Google ScholarDigital Library
- M. Pandey and S. Lazebnik. 2011. Scene recognition and weakly supervised object localization with deformable part-based models. In Proceedings of the International Conference on Computer Vision. 1307–1314. Google ScholarDigital Library
- Nikolaos Pappas and Andrei Popescu-Belis. 2014. Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 455–466. DOI:https://doi.org/10.3115/v1/D14-1052Google ScholarCross Ref
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, 8026–8037. Retrieved from http://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf. Google ScholarDigital Library
- Minlong Peng and Qi Zhang. 2019. Address instance-level label prediction in multiple-instance learning. Retrieved from http://arxiv.org/abs/1905.12226.Google Scholar
- Pedro H. O. Pinheiro and Ronan Collobert. 2014. Weakly supervised semantic segmentation with convolutional networks. Retrieved from http://arxiv.org/abs/1411.6228.Google Scholar
- Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. Retrieved from http://arxiv.org/abs/1506.02640.Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). MIT Press, Cambridge, MA, 91–99. Google ScholarDigital Library
- W. Ren, K. Huang, D. Tao, and T. Tan. 2016. Weakly supervised large scale object localization with multiple-instance learning and bag splitting. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2 (2016), 405–416. Google ScholarDigital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2014. ImageNet large scale visual recognition challenge. Retrieved from http://arxiv.org/abs/1409.0575. Google ScholarDigital Library
- F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. 2009. The graph neural network model. IEEE Trans. Neural Netw. 20, 1 (2009), 61–80. Google ScholarDigital Library
- Y. Shen, R. Ji, Y. Wang, Y. Wu, and L. Cao. 2019. Cyclic guidance for weakly supervised joint detection and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 697–707.Google Scholar
- Y. Shen, R. Ji, S. Zhang, W. Zuo, and Y. Wang. 2018. Generative adversarial learning towards fast weakly supervised detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5764–5773.Google Scholar
- M. Shi, H. Caesar, and V. Ferrari. 2017. Weakly supervised object localization using things and stuff transfer. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3401–3410.Google Scholar
- Z. Shi, T. M. Hospedales, and T. Xiang. 2015. Bayesian joint modelling for object localisation in weakly labelled images. IEEE Trans. Pattern Anal. Mach. Intell. 37, 10 (2015), 1959–1972. Google ScholarDigital Library
- Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV’06), Aleš Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–15. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Krishna Kumar Singh and Yong Jae Lee. 2019. You reap what you sow: Using videos to generate high precision object proposals for weakly-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
- Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, and Trevor Darrell. 2014. Weakly-supervised discovery of visual pattern configurations. Retrieved from http://arxiv.org/abs/1406.6507.Google ScholarDigital Library
- Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, and Alan L. Yuille. 2018. PCL: Proposal cluster learning for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1807.03342.Google Scholar
- Peng Tang, Xinggang Wang, Xiang Bai, and Wenyu Liu. 2017. Multiple-instance detection network with online instance classifier refinement. Retrieved from http://arxiv.org/abs/1704.00138.Google Scholar
- Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, and Wenyu Liu. 2017. Deep patch learning for weakly supervised object classification and discovery. Retrieved from http://arxiv.org/abs/1705.02429.Google Scholar
- Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan, Wenyu Liu, Junzhou Huang, and Alan Yuille. 2018. Weakly supervised region proposal network and object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 370–386. Google ScholarDigital Library
- Torralba, Murphy, Freeman, and Rubin. 2003. Context-based vision system for place and object recognition. In Proceedings of the 9th IEEE International Conference on Computer Vision, Vol. 1. 273–280. Google ScholarDigital Library
- Z. Tu and X. Bai. 2010. Auto-context and its application to high-level vision tasks and 3D brain image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 32, 10 (2010), 1744–1757. Google ScholarDigital Library
- J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. Int. J. Comput. Vision 104, 2 (2013), 154--171. https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013. Google ScholarDigital Library
- Rianne van den Berg, Thomas Kipf, and Max Welling. 2017. Graph convolutional matrix completion. Retrieved from https://abs/1706.02263.Google Scholar
- Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=rJXMpikCZ.Google Scholar
- Fang Wan, Chang Liu, Wei Ke, Xiangyang Ji, Jianbin Jiao, and Qixiang Ye. 2019. C-MIL: Continuation multiple-instance learning for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1904.05647.Google Scholar
- Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, and Qixiang Ye. 2019. Min-entropy latent model for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1902.06057.Google Scholar
- Chong Wang, Weiqiang Ren, Kaiqi Huang, and Tieniu Tan. 2014. Weakly supervised object localization with latent category learning. In Proceedings of the European Conference on Computer Vision (ECCV’14), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 431–445. Google ScholarCross Ref
- Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, and Dahua Lin. 2019. Region proposal by guided anchoring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
- Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. 2017. Non-local neural networks. Retrieved from http://arxiv.org/abs/1711.07971.Google Scholar
- Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi, Jinjun Xiong, Jiashi Feng, and Thomas S. Huang. 2018. TS2C: Tight box mining with surrounding segmentation context for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1807.04897.Google Scholar
- Jiajun Wu, Yinan Yu, Chang Huang, and Kai Yu. 2015. Deep multiple-instance learning for image classification and auto-annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15). IEEE Computer Society, 3460--3469. DOI:10.1109/CVPR.2015.7298968Google ScholarCross Ref
- Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. 2018. GraphRNN: Generating realistic graphs with deep auto-regressive models. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80, 5708–5717. Retrieved from http://proceedings.mlr.press/v80/you18a.html.Google Scholar
- Jiani Zhang, Xingjian Shi, Junyuan Xie, Hao Ma, Irwin King, and Dit-Yan Yeung. 2018. GaAN: Gated attention networks for learning on large and spatiotemporal graphs. Retrieved from http://arxiv.org/abs/1803.07294.Google Scholar
- Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z. Li. 2017. Single-shot refinement neural network for object detection. Retrieved from http://arxiv.org/abs/1711.06897.Google Scholar
- Xiaopeng Zhang, Jiashi Feng, Hongkai Xiong, and Qi Tian. 2018. Zigzag learning for weakly supervised object detection. Retrieved from http://arxiv.org/abs/1804.09466.Google Scholar
- Y. Zhang, Y. Bai, M. Ding, Y. Li, and B. Ghanem. 2018. W2F: A weakly-supervised to fully-supervised framework for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 928–936.Google Scholar
- Zhi-Hua Zhou, Yu-Yin Sun, and Yu-Feng Li. 2008. Multi-instance learning by treating instances as non-I.I.D. samples. Retrieved from http://arxiv.org/abs/0807.1997. Google ScholarDigital Library
- Zhi-Hua Zhou and Min-Ling Zhang. 2002. Neural networks for multi-instance learning. Proceedings of the International Conference on Intelligent Information Technology.Google Scholar
- Larry Zitnick and Piotr Dollar. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision (ECCV’14). Retrieved from https://www.microsoft.com/en-us/research/publication/edge-boxes-locating-object-proposals-from-edges/.Google ScholarCross Ref
Index Terms
- Multi-peak Graph-based Multi-instance Learning for Weakly Supervised Object Detection
Recommendations
A Dual-Network Progressive Approach to Weakly Supervised Object Detection
MM '17: Proceedings of the 25th ACM international conference on MultimediaA major challenge that arises in Weakly Supervised Object Detection (WSOD) is that only image-level labels are available, whereas WSOD trains instance-level object detectors. A typical approach to WSOD is to 1) generate a series of region proposals for ...
Cost‐effective multi‐instance multilabel active learning
AbstractMulti‐instance multi‐label (MIML) Active Learning (M2AL) aims to improve the learner while reducing the cost as much as possible by querying informative labels of complex bags composed of diverse instances. Existing M2AL solutions suffer high ...
Diversified dictionaries for multi-instance learning
Multiple-instance learning (MIL) has been a popular topic in the study of pattern recognition for years due to its usefulness for such tasks as drug activity prediction and image/text classification. In a typical MIL setting, a bag contains a bag-level ...
Comments