A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Progressive Multi-stage Interactive Training in Mobile Network for Fine-grained Recognition
[article]
2021
arXiv
pre-print
In this paper, based on the lightweight MobilenetV2, we propose a Progressive Multi-Stage Interactive training method with a Recursive Mosaic Generator (RMG-PMSI). ...
Fine-grained Visual Classification (FGVC) aims to identify objects from subcategories. It is a very challenging task because of the subtle inter-class differences. ...
The key to improving
pared our RMG with other data augmentation methods that performance on a lightweight mobile network is to take full
are widely used in fine-grained classifications ...
arXiv:2112.04223v1
fatcat:sn2wcwqsgbgv7l7puhgqpapcum
Exploring Localization for Self-supervised Fine-grained Contrastive Learning
[article]
2022
arXiv
pre-print
Extensive experiments on both small- and large-scale fine-grained classification benchmarks show that CVSA significantly improves the learned representation. ...
Despite their success in various downstream tasks such as image classification and object detection, self-supervised pre-training for fine-grained scenarios is not fully explored. ...
Cross-view Attention. We seek to capitalize on the pixel-level foreground semantic interactions between the feature maps of two different augmented views. ...
arXiv:2106.15788v4
fatcat:phy35lt6zzdkrn4mkyqbpjizju
TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration
[article]
2022
arXiv
pre-print
In this work, we present a novel vision-based framework for recognizing secondary driver behaviours based on visual transformers and an additional augmented feature distribution calibration module. ...
real-life deployment of data-driven models. ...
For the fine-grained task, N mine , δ , η and N hard are set to 30, 1.2, 400 and 1 and the attention-based classification head is optimized for 1200 epochs. ...
arXiv:2203.00927v2
fatcat:tiaymojzjnc23bven2dc2pp3za
Dual Attention Networks for Few-Shot Fine-Grained Recognition
2022
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
In this paper, to generate fine-grained tailored representations for few-shot recognition, we propose a Dual Attention Network (Dual Att-Net) consisting of two dual branches of both hard- and soft-attentions ...
Experiments on three popular fine-grained benchmark datasets show that our Dual Att-Net obviously outperforms other existing state-of-the-art methods. ...
We also gratefully acknowledge the support of MindSpore, CANN (Compute Architecture for Neural Networks) and Ascend AI Processor used for this research. ...
doi:10.1609/aaai.v36i3.20196
fatcat:kwd5km6aj5bspg6p65mfz2yvau
A Multi-view Metric Learning Method for Few-shot Fine-grained Classification
2022
IEEE Access
To solve this problem, a Multi-view Metric Learning (MML) method is proposed, which is based on a new concept (View Bag) and its effective similarity measurement method to achieve better few-shot fine-grained ...
Few-shot fine-grained image classification aims to solve the learning problem with few limited labeled examples. ...
[2] proposed an Attentive Pairwise Interaction Network (API-NET) based on the principle that a person classifies fine-grained objects by comparing them in pairs. ...
doi:10.1109/access.2022.3175798
fatcat:an43dwb4qff57gzdotp27h75ky
Handwashing Action Detection System for an Autonomous Social Robot
[article]
2022
arXiv
pre-print
A modified convolution neural network (CNN) architecture with Channel Spatial Attention Bilinear Pooling (CSAB) frame, with a VGG-16 architecture as the backbone is trained and validated on an augmented ...
Our findings indicate that the approach can recognize even subtle hand movements in the video and can be used for gesture detection and classification in social robotics. ...
Such classification problems are called Fine-Grained Image Classifications. ...
arXiv:2210.15804v1
fatcat:ygtx4tgkxnhrjir3mqqdydpgxu
DaViT: Dual Attention Vision Transformers
[article]
2022
arXiv
pre-print
Without extra data, DaViT-Tiny, DaViT-Small, and DaViT-Base achieve 82.8%, 84.2%, and 84.6% top-1 accuracy on ImageNet-1K with 28.3M, 49.7M, and 87.9M parameters, respectively. ...
and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained ...
Furthermore, we analyze in detail how our dual attention obtains global interactions as well as fine-grained local features, showing its effectiveness in benefiting various tasks, e.g., classification, ...
arXiv:2204.03645v1
fatcat:vwyjmaj6uvg7xmk4pkggmxv64i
Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection
[article]
2023
arXiv
pre-print
In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels. ...
Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. ...
The overview of proposed interacted dual attention conformer with scene-based mask for sound event detection
Fig. 2 . 2 Fig. 2. ...
arXiv:2311.14068v2
fatcat:cygy4byg75bflitpx6yrbo7r7e
2021 Index IEEE Transactions on Image Processing Vol. 30
2021
IEEE Transactions on Image Processing
., +, TIP 2021 6648-6658 AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification. ...
., +, TIP 2021 2810-2825 AP-CNN: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification. ...
doi:10.1109/tip.2022.3142569
fatcat:z26yhwuecbgrnb2czhwjlf73qu
Uni-Dual: A Generic Unified Dual-Task Medical Self-Supervised Learning Framework
2023
Proceedings of the 31st ACM International Conference on Multimedia
Experiments conducted on various downstream tasks with different modalities show the proposed Uni-Dual substantially outperforms other competitive SSL methods. ...
Our Uni-Dual enjoys the following benefits: (1) A unified model which can be easily transferred to different downstream tasks on various modality combinations. (2) We consider multi-constituent and structured ...
Discussions on Future Directions. Currently, our Uni-Dual is designed based on 2D networks. ...
doi:10.1145/3581783.3612335
fatcat:volhksmtq5fzvpw3igo3fhanm4
Towards Better Guided Attention and Human Knowledge Insertion in Deep Convolutional Neural Networks
[article]
2022
arXiv
pre-print
We evaluate MSABN on benchmark image recognition and fine-grained recognition datasets where we observe MSABN outperforms ABN and baseline models. ...
We also introduce a new data augmentation strategy utilizing the attention maps to incorporate human knowledge in the form of bounding box annotations of the objects of interest. ...
We have evaluated the accuracy of MSABN for image recognition and fine-grained classification on multiple datasets and it was shown to outperform the ABN models. ...
arXiv:2210.11177v1
fatcat:slzkndeol5fixfuz76yupsfdna
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
[article]
2023
arXiv
pre-print
However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression ...
Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream ...
., 2022a) fuses vision and language encoder backbones through merged co-attention which are then pre-trained on 4M data with two stage pre-training (coarse-and fine-grained). ...
arXiv:2210.04135v2
fatcat:5lmbfja4gzgrbd6qxckp5wko2y
Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification
[article]
2022
arXiv
pre-print
Few-shot fine-grained learning aims to classify a query image into one of a set of support categories with fine-grained differences. ...
Extensive experiments conducted on five public fine-grained benchmarks demonstrate that HelixFormer can effectively enhance the cross-image object semantic relation matching for recognizing fine-grained ...
input images; 3) Data augmentation-based methods [6, 14, 26, 36, 54, 70] that produce new samples to enlarge the training set for model training. ...
arXiv:2207.00784v1
fatcat:w2r23r2kencvvn5fpsdr54m7aa
Divide-and-Conquer Dual-Architecture Convolutional Neural Network for Classification of Hyperspectral Images
2019
Remote Sensing
Convolutional neural network (CNN) is well-known for its powerful capability on image classification. ...
For heterogeneous regions, a fine-grained CNN architecture with smaller spatial window inputs is constructed to learn hierarchical spectral features. ...
The architecture of the fine-grained CNN network is shown in Figure 6 . In the fine-grained CNN network, all the spectral bands are retained. ...
doi:10.3390/rs11050484
fatcat:y3en4igfibhazk3evbutzr43oe
Reciprocal Sequential Recommendation
2023
Proceedings of the 17th ACM Conference on Recommender Systems
To capture dual-perspective matching, we propose to learn finegrained sequence similarities by co-attention mechanism across different time steps. ...
Further, to improve the inference efficiency, we introduce the self-distillation technique to distill knowledge from the fine-grained matching module into the more efficient student module. ...
Specifically, we establish the behavior sequences in dual perspectives, and then conduct the two-sided matching by modeling fine-grained sequential semantic interactions. ...
doi:10.1145/3604915.3608798
fatcat:zltihebugnbxboss6vicf5rbhy
« Previous
Showing results 1 — 15 out of 5,778 results