Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








480 Hits in 4.6 sec

Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection [article]

Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen
2023 arXiv   pre-print
In addition, a novel scene-inspired mask (SIM) based on soft labels is incorporated for more precise SED predictions.  ...  In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels.  ...  Furthermore, the spectro-temporal receptive field is incorporated in convolutional layers to build a human auditory soft SED system [11] .  ... 
arXiv:2311.14068v2 fatcat:cygy4byg75bflitpx6yrbo7r7e

2021 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 29

2021 IEEE/ACM Transactions on Audio Speech and Language Processing  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  ., +, TASLP 2021 684-698 Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks.  ...  Shen, X., +, TASLP 2021 575-584 Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks.  ... 
doi:10.1109/taslp.2022.3147096 fatcat:7nl52k7sjfalbhpxtum3y5nmje

2021 Index IEEE Signal Processing Letters Vol. 28

2021 IEEE Signal Processing Letters  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  ., +, LSP 2021 1640-1644 Finger Vein Recognition Based on Multi-Receptive Field Bilinear Convolutional Neural Network.  ...  Zheng, M., +, LSP 2021 643-647 Finger Vein Recognition Based on Multi-Receptive Field Bilinear Convolutional Neural Network.  ... 
doi:10.1109/lsp.2022.3145253 fatcat:a3xqvok75vgepcckwnhh2mty74

Weakly-Supervised Semantic Segmentation of Circular-Scan, Synthetic-Aperture-Sonar Imagery [article]

Isaac J. Sledge, Dominic M. Byrne, Jonathan L. King, Steven H. Ostertag, Denton L. Woods, James L. Prater, Jermaine L. Kennedy, Timothy M. Marston, Jose C. Principe
2024 arXiv   pre-print
The classification uncertainty of each region is then evaluated.  ...  We propose a weakly-supervised framework for the semantic segmentation of circular-scan synthetic-aperture-sonar (CSAS) imagery.  ...  by image-interpolation methods, since the flow fields only characterize motion, not changes in visual appearance.  ... 
arXiv:2401.11313v1 fatcat:dccckfscj5cnri2ftzlnbtyo5i

Target Detection and Segmentation in Circular-Scan Synthetic-Aperture-Sonar Images using Semi-Supervised Convolutional Encoder-Decoders [article]

Isaac J. Sledge, Matthew S. Emigh, Jonathan L. King, Denton L. Woods, J. Tory Cobb, Jose C. Principe
2021 arXiv   pre-print
Our framework relies on a multi-branch, convolutional encoder-decoder network (MB-CEDN). The encoder portion of the MB-CEDN extracts visual contrast features from CSAS images.  ...  These features are fed into dual decoders that perform pixel-level segmentation to mask targets. Each decoder provides different perspectives as to what constitutes a salient target.  ...  resolution and large receptive fields.  ... 
arXiv:2101.03603v3 fatcat:plz7jnctrvcvjjmtovhoj6tjsq

Table of Contents

2021 IEEE/ACM Transactions on Audio Speech and Language Processing  
Evin Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ...  Goldwater Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism .  ... 
doi:10.1109/taslp.2021.3137064 fatcat:rpka3f2bhjh37c7pkhiowyndhm

Detection of Wheat Unsound Kernels Based on Improved ResNet

Hui Gao, Tong Zhen, Zhihui Li
2022 IEEE Access  
By comparing the advantages and disadvantages of GoogleNet, DenseNet, IX-ResNet, Res2Net, exploring the improvement of depth, width, downsampling mode, convolution order, attention mechanism18, receptive  ...  field, and finally puts forward a wheat unsound kernel detection model based on Res24_D_CBAM_Atrous.  ...  of wheat unsound kernel, it is found that the depth, width, downsampling mode, attention mechanism, convolution mode and receptive field size of the model are closely related to the accuracy and prediction  ... 
doi:10.1109/access.2022.3147838 fatcat:efsnxtvzwjamdhb2wdhjus3asm

Polyphonic training set synthesis improves self-supervised urban sound classification

Félix Gontier, Vincent Lostanlen, Mathieu Lagrange, Nicolas Fortin, Catherine Lavandier, Jean-François Petiot
2021 Journal of the Acoustical Society of America  
Then, in the supervised stage, we formulate a downstream task of multilabel urban sound classification on synthetic scenes.  ...  Machine listening systems for environmental acoustic monitoring face a shortage of expert annotations to be used as training data.  ...  ACKNOWLEDGMENTS This research was funded by the French National Agency for Research (Agence Nationale de la Recherche) Grant No. ANR-16-CE22-0012.  ... 
doi:10.1121/10.0005277 pmid:34241459 fatcat:55r5axrot5gfhngrthmopnlcxm

Audio-Visual Model Distillation Using Acoustic Images [article]

Andrés F. Pérez, Valentina Sanguineti, Pietro Morerio, Vittorio Murino
2020 arXiv   pre-print
In this paper, we investigate how to learn rich and robust feature representations for audio classification from visual data and acoustic images, a novel audio data modality.  ...  , also known as acoustic images, where the visual and acoustic images are aligned in space and synchronized in time.  ...  Detection and Classification of Acoustic Scenes and Events (DCASE) [29] is a dataset consistent of of recordings from various acoustic scenes.  ... 
arXiv:1904.07933v2 fatcat:wdxa3pcc75cfxdmzgtqm4szkpi

VHF Speech Enhancement Based on Transformer

Xue Han, Mingyang Pan, Zhengzhong Li, Haipeng Ge, Zongying Liu
2022 IEEE Open Journal of Intelligent Transportation Systems  
We select the Two-stage Transformer based Neural Network (TSTNN) as the baseline.  ...  Meanwhile, to improve the real-time performance , this study employs the lightweight convolution module (Depthwise Separable Convolution) to efficiency of VHF speech communication.  ...  [10] systematically aggregated the context to expand the receptive field through expended convolution. Craig et al.  ... 
doi:10.1109/ojits.2022.3147816 fatcat:phflefl2y5czbjfvw75buv4bxy

2020 Index IEEE Transactions on Multimedia Vol. 22

2020 IEEE transactions on multimedia  
of Convolutional Features for Scene Recognition.  ...  ., +, TMM Nov. 2020 2938-2949 MRFN: Multi-Receptive-Field Network for Fast and Accurate Single Image Super-Resolution.  ...  Image watermarking Blind Watermarking for 3-D Printed Objects by Locally Modifying Layer Thickness. 2780 -2791 Low-Light Image Enhancement With Semi-Decoupled Decomposition.  ... 
doi:10.1109/tmm.2020.3047236 fatcat:llha6qbaandfvkhrzpe5gek6mq

Urban Sound Classification : striving towards a fair comparison [article]

Augustin Arnault, Baptiste Hanssens, Nicolas Riche
2020 arXiv   pre-print
We hope this framework could help evaluate new architectures in this field. For better reproducibility, the code is available on our GitHub repository.  ...  It achieves a macro-AUPRC of 0.82 / 0.62 for the coarse / fine classification on validation set. Moreover, it reaches accuracies of 89.7% and 85.41% respectively on ESC-50 and US8k datasets.  ...  This challenge also contributes to research by 0 Work done as part of Multitel internship 1 Sounds of New York City 2 Detection and Classification of Acoustic Scenes and Events allowing participants to  ... 
arXiv:2010.11805v1 fatcat:wdthiawidnbbpgso3wrd2e5wia

Ambisonic Recordings of Typical Environments (ARTE) Database [article]

Joerg Matthias Buchholz, Adam Weisser
2019 Zenodo  
For each acoustic environment the following files are provided: HOA environment files: The recorded environments were decoded into 31mixed-order HOA channels and saved as WAV-files with a sampling frequency  ...  Note: The sound recordings are only to be used for non-commercial personal, educational or research purposes.  ...  The authors are grateful to Greg Stewart and Barry Clinch for their ongoing technical support. Thanks to Savanna Jones for helping with subjective data transfer.  ... 
doi:10.5281/zenodo.2261632 fatcat:o5icysoqafhdbixbggpiszrbzi

Table of Contents

2021 IEEE Signal Processing Letters  
Majidi Diverse Receptive Field Based Adversarial Concurrent Encoder Network for Image Inpainting . . . . . . . . . . . . . . . . . . . . . . .  ...  Li Finger Vein Recognition Based on Multi-Receptive Field Bilinear Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ... 
doi:10.1109/lsp.2021.3134549 fatcat:m6obtl7k7zdqvd62eo3c4tptfy

FCAU-Net for the Semantic Segmentation of Fine-Resolution Remotely Sensed Images

Xuerui Niu, Qiaolin Zeng, Xiaobo Luo, Liangfu Chen
2022 Remote Sensing  
Solving this problem can help overcome various obstacles in urban planning, land cover classification, and environmental protection, paving the way for scene-level landscape pattern analysis and decision  ...  Encoder-decoder structures based on attention mechanisms have been frequently used for fine-resolution image segmentation.  ...  Dual attention network for scene segmentation.  ... 
doi:10.3390/rs14010215 fatcat:gfnv5kbdk5a4tepwcwd7tmiihe
« Previous Showing results 1 — 15 out of 480 results