Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








5,778 Hits in 3.8 sec

Progressive Multi-stage Interactive Training in Mobile Network for Fine-grained Recognition [article]

Zhenxin Wu, Qingliang Chen, Yifeng Liu, Yinqi Zhang, Chengkai Zhu, Yang Yu
2021 arXiv   pre-print
In this paper, based on the lightweight MobilenetV2, we propose a Progressive Multi-Stage Interactive training method with a Recursive Mosaic Generator (RMG-PMSI).  ...  Fine-grained Visual Classification (FGVC) aims to identify objects from subcategories. It is a very challenging task because of the subtle inter-class differences.  ...  The key to improving pared our RMG with other data augmentation methods that performance on a lightweight mobile network is to take full are widely used in fine-grained classifications  ... 
arXiv:2112.04223v1 fatcat:sn2wcwqsgbgv7l7puhgqpapcum

Exploring Localization for Self-supervised Fine-grained Contrastive Learning [article]

Di Wu, Siyuan Li, Zelin Zang, Stan Z. Li
2022 arXiv   pre-print
Extensive experiments on both small- and large-scale fine-grained classification benchmarks show that CVSA significantly improves the learned representation.  ...  Despite their success in various downstream tasks such as image classification and object detection, self-supervised pre-training for fine-grained scenarios is not fully explored.  ...  Cross-view Attention. We seek to capitalize on the pixel-level foreground semantic interactions between the feature maps of two different augmented views.  ... 
arXiv:2106.15788v4 fatcat:phy35lt6zzdkrn4mkyqbpjizju

TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration [article]

Kunyu Peng, Alina Roitberg, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen
2022 arXiv   pre-print
In this work, we present a novel vision-based framework for recognizing secondary driver behaviours based on visual transformers and an additional augmented feature distribution calibration module.  ...  real-life deployment of data-driven models.  ...  For the fine-grained task, N mine , δ , η and N hard are set to 30, 1.2, 400 and 1 and the attention-based classification head is optimized for 1200 epochs.  ... 
arXiv:2203.00927v2 fatcat:tiaymojzjnc23bven2dc2pp3za

Dual Attention Networks for Few-Shot Fine-Grained Recognition

Shu-Lin Xu, Faen Zhang, Xiu-Shen Wei, Jianhua Wang
2022 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
In this paper, to generate fine-grained tailored representations for few-shot recognition, we propose a Dual Attention Network (Dual Att-Net) consisting of two dual branches of both hard- and soft-attentions  ...  Experiments on three popular fine-grained benchmark datasets show that our Dual Att-Net obviously outperforms other existing state-of-the-art methods.  ...  We also gratefully acknowledge the support of MindSpore, CANN (Compute Architecture for Neural Networks) and Ascend AI Processor used for this research.  ... 
doi:10.1609/aaai.v36i3.20196 fatcat:kwd5km6aj5bspg6p65mfz2yvau

A Multi-view Metric Learning Method for Few-shot Fine-grained Classification

Zhuang Miao, Xun Zhao, Jiabao Wang, Bo Xu, Yang Li, Hang Li
2022 IEEE Access  
To solve this problem, a Multi-view Metric Learning (MML) method is proposed, which is based on a new concept (View Bag) and its effective similarity measurement method to achieve better few-shot fine-grained  ...  Few-shot fine-grained image classification aims to solve the learning problem with few limited labeled examples.  ...  [2] proposed an Attentive Pairwise Interaction Network (API-NET) based on the principle that a person classifies fine-grained objects by comparing them in pairs.  ... 
doi:10.1109/access.2022.3175798 fatcat:an43dwb4qff57gzdotp27h75ky

Handwashing Action Detection System for an Autonomous Social Robot [article]

Sreejith Sasidharan, Pranav Prabha, Devasena Pasupuleti, Anand M Das, Chaitanya Kapoor, Gayathri Manikutty, Praveen Pankajakshan, Bhavani Rao
2022 arXiv   pre-print
A modified convolution neural network (CNN) architecture with Channel Spatial Attention Bilinear Pooling (CSAB) frame, with a VGG-16 architecture as the backbone is trained and validated on an augmented  ...  Our findings indicate that the approach can recognize even subtle hand movements in the video and can be used for gesture detection and classification in social robotics.  ...  Such classification problems are called Fine-Grained Image Classifications.  ... 
arXiv:2210.15804v1 fatcat:ygtx4tgkxnhrjir3mqqdydpgxu

DaViT: Dual Attention Vision Transformers [article]

Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan
2022 arXiv   pre-print
Without extra data, DaViT-Tiny, DaViT-Small, and DaViT-Base achieve 82.8%, 84.2%, and 84.6% top-1 accuracy on ImageNet-1K with 28.3M, 49.7M, and 87.9M parameters, respectively.  ...  and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained  ...  Furthermore, we analyze in detail how our dual attention obtains global interactions as well as fine-grained local features, showing its effectiveness in benefiting various tasks, e.g., classification,  ... 
arXiv:2204.03645v1 fatcat:vwyjmaj6uvg7xmk4pkggmxv64i

Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection [article]

Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen
2023 arXiv   pre-print
In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels.  ...  Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED.  ...  The overview of proposed interacted dual attention conformer with scene-based mask for sound event detection Fig. 2 . 2 Fig. 2.  ... 
arXiv:2311.14068v2 fatcat:cygy4byg75bflitpx6yrbo7r7e

2021 Index IEEE Transactions on Image Processing Vol. 30

2021 IEEE Transactions on Image Processing  
., +, TIP 2021 6648-6658 AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification.  ...  ., +, TIP 2021 2810-2825 AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification.  ... 
doi:10.1109/tip.2022.3142569 fatcat:z26yhwuecbgrnb2czhwjlf73qu

Uni-Dual: A Generic Unified Dual-Task Medical Self-Supervised Learning Framework

Boxiang Yun, Xingran Xie, Qingli Li, Yan Wang
2023 Proceedings of the 31st ACM International Conference on Multimedia  
Experiments conducted on various downstream tasks with different modalities show the proposed Uni-Dual substantially outperforms other competitive SSL methods.  ...  Our Uni-Dual enjoys the following benefits: (1) A unified model which can be easily transferred to different downstream tasks on various modality combinations. (2) We consider multi-constituent and structured  ...  Discussions on Future Directions. Currently, our Uni-Dual is designed based on 2D networks.  ... 
doi:10.1145/3581783.3612335 fatcat:volhksmtq5fzvpw3igo3fhanm4

Towards Better Guided Attention and Human Knowledge Insertion in Deep Convolutional Neural Networks [article]

Ankit Gupta, Ida-Maria Sintorn
2022 arXiv   pre-print
We evaluate MSABN on benchmark image recognition and fine-grained recognition datasets where we observe MSABN outperforms ABN and baseline models.  ...  We also introduce a new data augmentation strategy utilizing the attention maps to incorporate human knowledge in the form of bounding box annotations of the objects of interest.  ...  We have evaluated the accuracy of MSABN for image recognition and fine-grained classification on multiple datasets and it was shown to outperform the ABN models.  ... 
arXiv:2210.11177v1 fatcat:slzkndeol5fixfuz76yupsfdna

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment [article]

Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun, Rama Chellappa
2023 arXiv   pre-print
However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression  ...  Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream  ...  ., 2022a) fuses vision and language encoder backbones through merged co-attention which are then pre-trained on 4M data with two stage pre-training (coarse-and fine-grained).  ... 
arXiv:2210.04135v2 fatcat:5lmbfja4gzgrbd6qxckp5wko2y

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification [article]

Bo Zhang, Jiakang Yuan, Baopu Li, Tao Chen, Jiayuan Fan, Botian Shi
2022 arXiv   pre-print
Few-shot fine-grained learning aims to classify a query image into one of a set of support categories with fine-grained differences.  ...  Extensive experiments conducted on five public fine-grained benchmarks demonstrate that HelixFormer can effectively enhance the cross-image object semantic relation matching for recognizing fine-grained  ...  input images; 3) Data augmentation-based methods [6, 14, 26, 36, 54, 70] that produce new samples to enlarge the training set for model training.  ... 
arXiv:2207.00784v1 fatcat:w2r23r2kencvvn5fpsdr54m7aa

Divide-and-Conquer Dual-Architecture Convolutional Neural Network for Classification of Hyperspectral Images

Jie Feng, Lin Wang, Haipeng Yu, Licheng Jiao, Xiangrong Zhang
2019 Remote Sensing  
Convolutional neural network (CNN) is well-known for its powerful capability on image classification.  ...  For heterogeneous regions, a fine-grained CNN architecture with smaller spatial window inputs is constructed to learn hierarchical spectral features.  ...  The architecture of the fine-grained CNN network is shown in Figure 6 . In the fine-grained CNN network, all the spectral bands are retained.  ... 
doi:10.3390/rs11050484 fatcat:y3en4igfibhazk3evbutzr43oe

Reciprocal Sequential Recommendation

Bowen Zheng, Yupeng Hou, Wayne Xin Zhao, Yang Song, Hengshu Zhu
2023 Proceedings of the 17th ACM Conference on Recommender Systems  
To capture dual-perspective matching, we propose to learn finegrained sequence similarities by co-attention mechanism across different time steps.  ...  Further, to improve the inference efficiency, we introduce the self-distillation technique to distill knowledge from the fine-grained matching module into the more efficient student module.  ...  Specifically, we establish the behavior sequences in dual perspectives, and then conduct the two-sided matching by modeling fine-grained sequential semantic interactions.  ... 
doi:10.1145/3604915.3608798 fatcat:zltihebugnbxboss6vicf5rbhy
« Previous Showing results 1 — 15 out of 5,778 results