Dual attention interactive fine-grained classification network based on data augmentation.

In this paper, based on the lightweight MobilenetV2, we propose a Progressive Multi-Stage Interactive training method with a Recursive Mosaic Generator (RMG-PMSI). ... Fine-grained Visual Classification (FGVC) aims to identify objects from subcategories. It is a very challenging task because of the subtle inter-class differences. ... The key to improving pared our RMG with other data augmentation methods that performance on a lightweight mobile network is to take full are widely used in fine-grained classifications ...

arXiv:2112.04223v1 fatcat:sn2wcwqsgbgv7l7puhgqpapcum

Open Access

Extensive experiments on both small- and large-scale fine-grained classification benchmarks show that CVSA significantly improves the learned representation. ... Despite their success in various downstream tasks such as image classification and object detection, self-supervised pre-training for fine-grained scenarios is not fully explored. ... Cross-view Attention. We seek to capitalize on the pixel-level foreground semantic interactions between the feature maps of two different augmented views. ...

arXiv:2106.15788v4 fatcat:phy35lt6zzdkrn4mkyqbpjizju

Multiple Versions

In this work, we present a novel vision-based framework for recognizing secondary driver behaviours based on visual transformers and an additional augmented feature distribution calibration module. ... real-life deployment of data-driven models. ... For the fine-grained task, N mine , δ , η and N hard are set to 30, 1.2, 400 and 1 and the attention-based classification head is optimized for 1200 epochs. ...

arXiv:2203.00927v2 fatcat:tiaymojzjnc23bven2dc2pp3za

Multiple Versions

In this paper, to generate fine-grained tailored representations for few-shot recognition, we propose a Dual Attention Network (Dual Att-Net) consisting of two dual branches of both hard- and soft-attentions ... Experiments on three popular fine-grained benchmark datasets show that our Dual Att-Net obviously outperforms other existing state-of-the-art methods. ... We also gratefully acknowledge the support of MindSpore, CANN (Compute Architecture for Neural Networks) and Ascend AI Processor used for this research. ...

doi:10.1609/aaai.v36i3.20196 fatcat:kwd5km6aj5bspg6p65mfz2yvau

To solve this problem, a Multi-view Metric Learning (MML) method is proposed, which is based on a new concept (View Bag) and its effective similarity measurement method to achieve better few-shot fine-grained ... Few-shot fine-grained image classification aims to solve the learning problem with few limited labeled examples. ... [2] proposed an Attentive Pairwise Interaction Network (API-NET) based on the principle that a person classifies fine-grained objects by comparing them in pairs. ...

doi:10.1109/access.2022.3175798 fatcat:an43dwb4qff57gzdotp27h75ky

DOAJ

A modified convolution neural network (CNN) architecture with Channel Spatial Attention Bilinear Pooling (CSAB) frame, with a VGG-16 architecture as the backbone is trained and validated on an augmented ... Our findings indicate that the approach can recognize even subtle hand movements in the video and can be used for gesture detection and classification in social robotics. ... Such classification problems are called Fine-Grained Image Classifications. ...

arXiv:2210.15804v1 fatcat:ygtx4tgkxnhrjir3mqqdydpgxu

Open Access

Without extra data, DaViT-Tiny, DaViT-Small, and DaViT-Base achieve 82.8%, 84.2%, and 84.6% top-1 accuracy on ImageNet-1K with 28.3M, 49.7M, and 87.9M parameters, respectively. ... and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained ... Furthermore, we analyze in detail how our dual attention obtains global interactions as well as fine-grained local features, showing its effectiveness in benefiting various tasks, e.g., classification, ...

arXiv:2204.03645v1 fatcat:vwyjmaj6uvg7xmk4pkggmxv64i

In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels. ... Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. ... The overview of proposed interacted dual attention conformer with scene-based mask for sound event detection Fig. 2 . 2 Fig. 2. ...

arXiv:2311.14068v2 fatcat:cygy4byg75bflitpx6yrbo7r7e

Multiple Versions

., +, TIP 2021 6648-6658 AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification. ... ., +, TIP 2021 2810-2825 AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification. ...

doi:10.1109/tip.2022.3142569 fatcat:z26yhwuecbgrnb2czhwjlf73qu

Experiments conducted on various downstream tasks with different modalities show the proposed Uni-Dual substantially outperforms other competitive SSL methods. ... Our Uni-Dual enjoys the following benefits: (1) A unified model which can be easily transferred to different downstream tasks on various modality combinations. (2) We consider multi-constituent and structured ... Discussions on Future Directions. Currently, our Uni-Dual is designed based on 2D networks. ...

doi:10.1145/3581783.3612335 fatcat:volhksmtq5fzvpw3igo3fhanm4

We evaluate MSABN on benchmark image recognition and fine-grained recognition datasets where we observe MSABN outperforms ABN and baseline models. ... We also introduce a new data augmentation strategy utilizing the attention maps to incorporate human knowledge in the form of bounding box annotations of the objects of interest. ... We have evaluated the accuracy of MSABN for image recognition and fine-grained classification on multiple datasets and it was shown to outperform the ABN models. ...

arXiv:2210.11177v1 fatcat:slzkndeol5fixfuz76yupsfdna

Open Access

However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression ... Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream ... ., 2022a) fuses vision and language encoder backbones through merged co-attention which are then pre-trained on 4M data with two stage pre-training (coarse-and fine-grained). ...

arXiv:2210.04135v2 fatcat:5lmbfja4gzgrbd6qxckp5wko2y

Open Access Multiple Versions

Few-shot fine-grained learning aims to classify a query image into one of a set of support categories with fine-grained differences. ... Extensive experiments conducted on five public fine-grained benchmarks demonstrate that HelixFormer can effectively enhance the cross-image object semantic relation matching for recognizing fine-grained ... input images; 3) Data augmentation-based methods [6, 14, 26, 36, 54, 70] that produce new samples to enlarge the training set for model training. ...

arXiv:2207.00784v1 fatcat:w2r23r2kencvvn5fpsdr54m7aa

Convolutional neural network (CNN) is well-known for its powerful capability on image classification. ... For heterogeneous regions, a fine-grained CNN architecture with smaller spatial window inputs is constructed to learn hierarchical spectral features. ... The architecture of the fine-grained CNN network is shown in Figure 6 . In the fine-grained CNN network, all the spectral bands are retained. ...

doi:10.3390/rs11050484 fatcat:y3en4igfibhazk3evbutzr43oe

DOAJ Szczepanski

To capture dual-perspective matching, we propose to learn finegrained sequence similarities by co-attention mechanism across different time steps. ... Further, to improve the inference efficiency, we introduce the self-distillation technique to distill knowledge from the fine-grained matching module into the more efficient student module. ... Specifically, we establish the behavior sequences in dual perspectives, and then conduct the two-sided matching by modeling fine-grained sequential semantic interactions. ...

doi:10.1145/3604915.3608798 fatcat:zltihebugnbxboss6vicf5rbhy

Progressive Multi-stage Interactive Training in Mobile Network for Fine-grained Recognition [article]

Preserved Fulltext

Exploring Localization for Self-supervised Fine-grained Contrastive Learning [article]

Preserved Fulltext

Other Versions

TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration [article]

Preserved Fulltext

Other Versions

Dual Attention Networks for Few-Shot Fine-Grained Recognition

Preserved Fulltext

A Multi-view Metric Learning Method for Few-shot Fine-grained Classification

Preserved Fulltext

Handwashing Action Detection System for an Autonomous Social Robot [article]

Preserved Fulltext

DaViT: Dual Attention Vision Transformers [article]

Preserved Fulltext

Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection [article]

Preserved Fulltext

2021 Index IEEE Transactions on Image Processing Vol. 30

Preserved Fulltext

Uni-Dual: A Generic Unified Dual-Task Medical Self-Supervised Learning Framework

Preserved Fulltext

Towards Better Guided Attention and Human Knowledge Insertion in Deep Convolutional Neural Networks [article]

Preserved Fulltext

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment [article]

Preserved Fulltext

Other Versions

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification [article]

Preserved Fulltext

Divide-and-Conquer Dual-Architecture Convolutional Neural Network for Classification of Hyperspectral Images

Preserved Fulltext

Reciprocal Sequential Recommendation

Preserved Fulltext