Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

52,609 Hits in 4.5 sec

Multiple Anchor Learning for Visual Object Detection [article]

Wei Ke and Tianliang Zhang and Zeyi Huang and Qixiang Ye and Jianzhuang Liu and Dong Huang
2019 arXiv   pre-print
In this paper, we propose a Multiple Instance Learning (MIL) approach that selects anchors and jointly optimizes the two modules of a CNN-based object detector.  ...  Classification and localization are two pillars of visual object detectors.  ...  Conclusion We have proposed an elegant and effective training approach, referred to as Multiple Anchor Learning (MAL), for visual object detection.  ... 
arXiv:1912.02252v1 fatcat:nwe35ue2nfhmjpbg772weinvtq

Multiple Anchor Learning for Visual Object Detection

Wei Ke, Tianliang Zhang, Zeyi Huang, Qixiang Ye, Jianzhuang Liu, Dong Huang
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
In this paper, we propose a Multiple Instance Learning (MIL) approach that selects anchors and jointly optimizes the two modules of a CNN-based object detector.  ...  Classification and localization are two pillars of visual object detectors.  ...  Conclusion We have proposed an elegant and effective training approach, referred to as Multiple Anchor Learning (MAL), for visual object detection.  ... 
doi:10.1109/cvpr42600.2020.01022 dblp:conf/cvpr/KeZHYLH20 fatcat:xhsmmkxvlrdsfmwgt63cfw6olu

FreeAnchor: Learning to Match Anchors for Visual Object Detection [article]

Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, Qixiang Ye
2019 arXiv   pre-print
Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Unit (IoU).  ...  In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner.  ...  This provides a fresh insight for the visual object detection problem. Acnkowledgement.  ... 
arXiv:1909.02466v2 fatcat:fj2mh5q2ize53kzmohrkrhusuq

NL-FCOS: Improving FCOS through Non-Local Modules for Object Detection [article]

Lukas Pavez, Jose M. Saavedra Rondo
2022 arXiv   pre-print
An object detection methodology closer to the natural model is anchor-free detection, where models like FCOS or Centernet have shown competitive results, but these have not yet exploited the concept of  ...  In addition, using anchors to fit bounding boxes seems far from how our visual system does the same visual task.  ...  Object detection is one of the computer vision tasks with multiple industry applications. Its goal is to localize and classify objects in an image or video.  ... 
arXiv:2203.15638v1 fatcat:zvxls4u3dffglhjzhb6vsfrw7a

Unifying Visual Perception by Dispersible Points Learning [article]

Jianming Liang, Guanglu Song, Biao Leng, Yu Liu
2022 arXiv   pre-print
We present a conceptually simple, flexible, and universal visual perception head for variant visual tasks, e.g., classification, object detection, instance segmentation and pose estimation, and different  ...  The method, called UniHead, views different visual perception tasks as the dispersible points learning via the transformer encoder architecture.  ...  Then, for an anchor point, UniHead obtains multiple points via dispersible points learning.  ... 
arXiv:2208.08630v2 fatcat:qztzwohhb5df5jhj7kfcmhy77i

Learning the semantic structure of objects from Web supervision [article]

David Novotny, Diane Larlus, Andrea Vedaldi
2021 arXiv   pre-print
Recognizing object parts and attributes has been extensively studied before, yet learning large space of such concepts remains elusive due to the high cost of providing detailed object annotations for  ...  We also show that the resulting embedding provides a visually-intuitive mechanism to navigate the learned concepts and their corresponding images.  ...  We are grateful for support by XRCE and ERC StG 638009-IDIU.  ... 
arXiv:1607.01205v2 fatcat:zepwfyx3krft5ag4lxkbw2vt6m

Online Descriptor Enhancement via Self-Labelling Triplets for Visual Data Association [article]

Yorai Shaoul, Katherine Liu, Kyel Ok, Nicholas Roy
2021 arXiv   pre-print
We propose a self-supervised method for incrementally refining visual descriptors to improve performance in the task of object-level visual data association.  ...  descriptors for the multi-object tracking task.  ...  INTRODUCTION We are interested in matching visual object detections across temporally separated frames -a fundamental capability for a wide range of applications in robotics and computer vision such as  ... 
arXiv:2011.10471v2 fatcat:eaqtsmj77zeexl3sozttukway4

Learning the Structure of Objects from Web Supervision [chapter]

David Novotny, Diane Larlus, Andrea Vedaldi
2016 Lecture Notes in Computer Science  
object annotations for supervision.  ...  We also show that the resulting embedding provides a visually-intuitive mechanism to navigate the learned concepts and their corresponding images.  ...  We would like to thank Xerox Research Center Europe and ERC 677195-IDIU for supporting this research.  ... 
doi:10.1007/978-3-319-49409-8_19 fatcat:rkpj4dmjdndfjg6ud5h44dsqke

re-OBJ: Jointly Learning the Foreground and Background for Object Instance Re-identification [chapter]

Vaibhav Bansal, Stuart James, Alessio Del Bue
2019 Lecture Notes in Computer Science  
However, learning appearances of the objects alone might fail when there are multiple objects with similar appearance or multiple instances of same object class present in the scene.  ...  We demonstrate the effectiveness of our joint visual feature in the re-identification of objects in the ScanNet dataset and show a relative improvement of around 28.25% in the rank-1 accuracy over the  ...  Object Visual Encoding For each object of the input images, we create two sets of images F = {I f , I b }.  ... 
doi:10.1007/978-3-030-30645-8_37 fatcat:5m6fe377wzcstmkfwlhcs445va

Scale-aware Anchor-free Object Detection via Curriculum Learning for Remote Sensing Images

Wandi Cai, Bo Zhang, Bin Wang
2021 IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing  
Index Terms-Remote sensing images, anchor-free object detection, feature pyramid structure, foreground attention, curriculum learning.  ...  In this paper, to address the above challenges, we propose a novel RSI anchor-free object detection framework that consists of two key components: a cross-channel feature pyramid network (CFPN) and multiple  ...  Besides, Fig. 10 shows the visual detection results of different methods on DIOR. It can be observed that our method shows better visualization results for the object detection task in RSIs.  ... 
doi:10.1109/jstars.2021.3115796 fatcat:2er7ee6wrncidj5i2nalqkleoq

Detect-and-describe: Joint learning framework for detection and description of objects

Adeel Zafar, Umar Khalid, W. Anggono
2019 MATEC Web of Conferences  
DaD is a deep learning-based approach that extends object detection to object attribute prediction as well. We train our model on aPascal train set and evaluate our approach on aPascal test set.  ...  We also show qualitative results for object attribute prediction on unseen objects, which demonstrate the effectiveness of our approach for describing unknown objects.  ...  Joint end-to end learning has multiple advantages over distinct learning. Firstly, simultaneous detection and attribute inference provide additional information about the identified object.  ... 
doi:10.1051/matecconf/201927702028 fatcat:eiten2xplfg6dms7zvoqh5qgpa

Object Detection in Videos with Tubelet Proposal Networks

Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
Different from object detection in static images, temporal information in videos is vital for object detection.  ...  ) network that incorporates temporal information from tubelet proposals for achieving high object detection accuracy in videos.  ...  Object detection in videos. Since the introduction of the VID task by the ImageNet challenge, there have been multiple object detection systems for detecting objects in videos.  ... 
doi:10.1109/cvpr.2017.101 dblp:conf/cvpr/KangLXOYLW17 fatcat:rjjgoxmpfnejtc3llowizqw7qa

A Self Validation Network for Object-Level Human Attention Estimation [article]

Zehua Zhang, Chen Yu, David Crandall
2019 arXiv   pre-print
Due to the foveated nature of the human vision system, people can focus their visual attention on a small region of their visual field at a time, which usually contains only a single object.  ...  A straightforward solution for this problem is to pick the object whose bounding box is hit by the gaze, where eye gaze point estimation is obtained from a traditional eye gaze estimator and object candidates  ...  Research, the College of Arts and Sciences, and the School of Informatics, Computing, and Engineering through the Emerging Areas of Research Project "Learning: Brains, Machines, and Children."  ... 
arXiv:1910.14260v2 fatcat:dr2xqgxy75cgngjzqkpheorcfa

Classifying All Interacting Pairs in a Single Shot [article]

Sanaa Chafik and Astrid Orcesi and Romaric Audigier and Bertrand Luvison
2020 arXiv   pre-print
In detail, interaction classification is achieved on a dense grid of anchors thanks to a joint multi-task network that learns three complementary tasks simultaneously: (i) prediction of the types of interaction  ...  In this paper, we introduce a novel human interaction detection approach, based on CALIPSO (Classifying ALl Interacting Pairs in a Single shOt), a classifier of human-object interactions.  ...  [23] learn a visual relation representation combining compositional representation for subject, target and predicate with a visual phrase representation for HOI detection.  ... 
arXiv:2001.04360v1 fatcat:wcbtmeonbreftcvjzm4kz4vfgu

Classifying All Interacting Pairs in a Single Shot

Sanaa Chafik, Astrid Orcesi, Romaric Audigier, Bertrand Luvison
2020 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)  
In detail, interaction classification is achieved on a dense grid of anchors thanks to a joint multi-task network that learns three complementary tasks simultaneously: (i) prediction of the types of interaction  ...  State-ofthe-art approaches adopt a multi-shot strategy based on a pairwise estimate of interactions for a set of human-object candidate pairs, which leads to a complexity depending, at least, on the number  ...  [23] learn a visual relation representation combining compositional representation for subject, target and predicate with a visual phrase representation for HOI detection.  ... 
doi:10.1109/wacv45572.2020.9093509 dblp:conf/wacv/ChafikOAL20 fatcat:paiqazymsravlaioakbi3ew2sy
« Previous Showing results 1 — 15 out of 52,609 results