Convolutional Receptive Field Dual Selection Mechanism for Acoustic Scene Classification.

In addition, a novel scene-inspired mask (SIM) based on soft labels is incorporated for more precise SED predictions. ... In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels. ... Furthermore, the spectro-temporal receptive field is incorporated in convolutional layers to build a human auditory soft SED system [11] . ...

arXiv:2311.14068v2 fatcat:cygy4byg75bflitpx6yrbo7r7e

Multiple Versions

The Author Index contains the primary entry for each item, listed under the first author's name. ... ., +, TASLP 2021 684-698 Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks. ... Shen, X., +, TASLP 2021 575-584 Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks. ...

doi:10.1109/taslp.2022.3147096 fatcat:7nl52k7sjfalbhpxtum3y5nmje

The Author Index contains the primary entry for each item, listed under the first author's name. ... ., +, LSP 2021 1640-1644 Finger Vein Recognition Based on Multi-Receptive Field Bilinear Convolutional Neural Network. ... Zheng, M., +, LSP 2021 643-647 Finger Vein Recognition Based on Multi-Receptive Field Bilinear Convolutional Neural Network. ...

doi:10.1109/lsp.2022.3145253 fatcat:a3xqvok75vgepcckwnhh2mty74

The classification uncertainty of each region is then evaluated. ... We propose a weakly-supervised framework for the semantic segmentation of circular-scan synthetic-aperture-sonar (CSAS) imagery. ... by image-interpolation methods, since the flow fields only characterize motion, not changes in visual appearance. ...

arXiv:2401.11313v1 fatcat:dccckfscj5cnri2ftzlnbtyo5i

Open Access

Our framework relies on a multi-branch, convolutional encoder-decoder network (MB-CEDN). The encoder portion of the MB-CEDN extracts visual contrast features from CSAS images. ... These features are fed into dual decoders that perform pixel-level segmentation to mask targets. Each decoder provides different perspectives as to what constitutes a salient target. ... resolution and large receptive fields. ...

arXiv:2101.03603v3 fatcat:plz7jnctrvcvjjmtovhoj6tjsq

Open Access Multiple Versions

Evin Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... Goldwater Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism . ...

doi:10.1109/taslp.2021.3137064 fatcat:rpka3f2bhjh37c7pkhiowyndhm

By comparing the advantages and disadvantages of GoogleNet, DenseNet, IX-ResNet, Res2Net, exploring the improvement of depth, width, downsampling mode, convolution order, attention mechanism18, receptive ... field, and finally puts forward a wheat unsound kernel detection model based on Res24_D_CBAM_Atrous. ... of wheat unsound kernel, it is found that the depth, width, downsampling mode, attention mechanism, convolution mode and receptive field size of the model are closely related to the accuracy and prediction ...

doi:10.1109/access.2022.3147838 fatcat:efsnxtvzwjamdhb2wdhjus3asm

DOAJ

Then, in the supervised stage, we formulate a downstream task of multilabel urban sound classification on synthetic scenes. ... Machine listening systems for environmental acoustic monitoring face a shortage of expert annotations to be used as training data. ... ACKNOWLEDGMENTS This research was funded by the French National Agency for Research (Agence Nationale de la Recherche) Grant No. ANR-16-CE22-0012. ...

doi:10.1121/10.0005277 pmid:34241459 fatcat:55r5axrot5gfhngrthmopnlcxm

In this paper, we investigate how to learn rich and robust feature representations for audio classification from visual data and acoustic images, a novel audio data modality. ... , also known as acoustic images, where the visual and acoustic images are aligned in space and synchronized in time. ... Detection and Classification of Acoustic Scenes and Events (DCASE) [29] is a dataset consistent of of recordings from various acoustic scenes. ...

arXiv:1904.07933v2 fatcat:wdxa3pcc75cfxdmzgtqm4szkpi

Multiple Versions

We select the Two-stage Transformer based Neural Network (TSTNN) as the baseline. ... Meanwhile, to improve the real-time performance , this study employs the lightweight convolution module (Depthwise Separable Convolution) to efficiency of VHF speech communication. ... [10] systematically aggregated the context to expand the receptive field through expended convolution. Craig et al. ...

doi:10.1109/ojits.2022.3147816 fatcat:phflefl2y5czbjfvw75buv4bxy

DOAJ

of Convolutional Features for Scene Recognition. ... ., +, TMM Nov. 2020 2938-2949 MRFN: Multi-Receptive-Field Network for Fast and Accurate Single Image Super-Resolution. ... Image watermarking Blind Watermarking for 3-D Printed Objects by Locally Modifying Layer Thickness. 2780 -2791 Low-Light Image Enhancement With Semi-Decoupled Decomposition. ...

doi:10.1109/tmm.2020.3047236 fatcat:llha6qbaandfvkhrzpe5gek6mq

We hope this framework could help evaluate new architectures in this field. For better reproducibility, the code is available on our GitHub repository. ... It achieves a macro-AUPRC of 0.82 / 0.62 for the coarse / fine classification on validation set. Moreover, it reaches accuracies of 89.7% and 85.41% respectively on ESC-50 and US8k datasets. ... This challenge also contributes to research by 0 Work done as part of Multitel internship 1 Sounds of New York City 2 Detection and Classification of Acoustic Scenes and Events allowing participants to ...

arXiv:2010.11805v1 fatcat:wdthiawidnbbpgso3wrd2e5wia

Open Access

For each acoustic environment the following files are provided: HOA environment files: The recorded environments were decoded into 31mixed-order HOA channels and saved as WAV-files with a sampling frequency ... Note: The sound recordings are only to be used for non-commercial personal, educational or research purposes. ... The authors are grateful to Greg Stewart and Barry Clinch for their ongoing technical support. Thanks to Savanna Jones for helping with subjective data transfer. ...

doi:10.5281/zenodo.2261632 fatcat:o5icysoqafhdbixbggpiszrbzi

Open Access

Majidi Diverse Receptive Field Based Adversarial Concurrent Encoder Network for Image Inpainting . . . . . . . . . . . . . . . . . . . . . . . ... Li Finger Vein Recognition Based on Multi-Receptive Field Bilinear Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

doi:10.1109/lsp.2021.3134549 fatcat:m6obtl7k7zdqvd62eo3c4tptfy

Solving this problem can help overcome various obstacles in urban planning, land cover classification, and environmental protection, paving the way for scene-level landscape pattern analysis and decision ... Encoder-decoder structures based on attention mechanisms have been frequently used for fine-resolution image segmentation. ... Dual attention network for scene segmentation. ...

doi:10.3390/rs14010215 fatcat:gfnv5kbdk5a4tepwcwd7tmiihe

DOAJ Szczepanski

Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection [article]

Preserved Fulltext

2021 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 29

Preserved Fulltext

2021 Index IEEE Signal Processing Letters Vol. 28

Preserved Fulltext

Weakly-Supervised Semantic Segmentation of Circular-Scan, Synthetic-Aperture-Sonar Imagery [article]

Preserved Fulltext

Target Detection and Segmentation in Circular-Scan Synthetic-Aperture-Sonar Images using Semi-Supervised Convolutional Encoder-Decoders [article]

Preserved Fulltext

Other Versions

Table of Contents

Preserved Fulltext

Detection of Wheat Unsound Kernels Based on Improved ResNet

Preserved Fulltext

Polyphonic training set synthesis improves self-supervised urban sound classification

Preserved Fulltext

Audio-Visual Model Distillation Using Acoustic Images [article]

Preserved Fulltext

Other Versions

VHF Speech Enhancement Based on Transformer

Preserved Fulltext

2020 Index IEEE Transactions on Multimedia Vol. 22

Preserved Fulltext

Urban Sound Classification : striving towards a fair comparison [article]

Preserved Fulltext

Ambisonic Recordings of Typical Environments (ARTE) Database [article]

Preserved Fulltext

Table of Contents

Preserved Fulltext

FCAU-Net for the Semantic Segmentation of Fine-Resolution Remotely Sensed Images

Preserved Fulltext