Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








42 Hits in 3.7 sec

Adversarial Speaker Verification

Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong
2019 ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
The use of deep networks to extract embeddings for speaker recognition has proven successfully.  ...  In this work, we propose an adversarial speaker verification (ASV) scheme to learn the condition-invariant deep embedding via adversarial multi-task training.  ...  DEEP EMBEDDING FOR SPEAKER VERIFICATION Deep embedding has been widely used for speaker verification.  ... 
doi:10.1109/icassp.2019.8682488 dblp:conf/icassp/MengZLG19 fatcat:swecovr4u5g3zinxektm23yi7i

Blind Speech Signal Quality Estimation for Speaker Verification Systems

Galina Lavrentyeva, Marina Volkova, Anastasia Avdeeva, Sergey Novoselov, Artem Gorlanov, Tseren Andzhukaev, Artem Ivanov, Alexander Kozlov
2020 Interspeech 2020  
This paper presents a neural network based approach for blind speech signal quality estimation in terms of signal-tonoise ratio (SNR) and reverberation time (RT60), which is able to classify the type of  ...  The present state-ofthe-art deep speaker embedding models are domain-sensitive.  ...  System description The proposed models are trained in a multitask mode: one neural network is simultaneously trained to predict SNR, RT60 and background noise class.  ... 
doi:10.21437/interspeech.2020-1826 dblp:conf/interspeech/LavrentyevaVANG20 fatcat:a6am67f56ncm5djhcpbfwmkk6m

Training Multi-Task Adversarial Network for Extracting Noise-Robust Speaker Embedding [article]

Jianfeng Zhou, Tao Jiang, Lin Li, Qingyang Hong, Zhe Wang, Bingyin Xia
2019 arXiv   pre-print
Motivated by the promising performance of multi-task training in a variety of image processing tasks, we explore the potential of multi-task adversarial training for learning a noise-robust speaker embedding  ...  Furthermore, experiments indicate that our method is also able to improve the speaker verification performance the clean condition.  ...  MULTI-TASK ADVERSARIAL NETWORK CNN Based Embedding Learning CNN-based neural network architecture has proved its superior performance in speaker verification tasks [7, 12] .  ... 
arXiv:1811.09355v2 fatcat:zgguy2as4jbrxlrz7b6kpobhvu

Gradient Regularization for Noise-Robust Speaker Verification

Jianchen Li, Jiqing Han, Hongwei Song
2021 Conference of the International Speech Communication Association  
Noise robustness is a challenge for speaker recognition systems. To solve this problem, one of the most common approaches is to joint-train a model by using both clean and noisy utterances.  ...  However, the gradients calculated on noisy utterances generally contain speaker-irrelevant noisy components, resulting in overfitting for the seen noisy data and poor generalization for the unseen noisy  ...  For example, the multitask adversarial training framework was proposed for training noise-robust speaker models [12] , and the unsupervised adversarial invariance architecture was adopted to disentangle  ... 
doi:10.21437/interspeech.2021-1216 dblp:conf/interspeech/LiHS21 fatcat:3qtofwpmvrhn5c7k3wgbihmzju

Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech

Arindam Jati, Raghuveer Peri, Monisankha Pal, Tae Jin Park, Naveen Kumar, Ruchir Travadi, Panayiotis Georgiou, Shrikanth Narayanan
2019 Interspeech 2019  
First, we adopt a newly proposed discriminative model that hybridizes Deep Neural Network (DNN) and Total Variability Model (TVM) with the goal of integrating their strengths.  ...  The paper aims to address the task of speaker verification with single-channel, noisy and far-field speech by learning an embedding or feature representation that is invariant to different acoustic environments  ...  [13] introduced x-vectors which employed different types of artificial augmentation to train a robust speaker embedding using a Time Delay Neural Network-(TDNN) based speaker classification model [  ... 
doi:10.21437/interspeech.2019-3010 dblp:conf/interspeech/JatiPPP0TGN19 fatcat:2y7smyvsgrg2fnpogf6vbro6aa

Multi-Task Learning for End-to-End Noise-Robust Bandwidth Extension

Nana Hou, Chenglin Xu, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li
2020 Interspeech 2020  
To alleviate such problem, we propose an end-to-end time-domain framework for noise-robust bandwidth extension, that jointly optimizes a mask-based speech enhancement and an ideal bandwidth extension module  ...  Speech bandwidth extension methods, such as deep neural networks (DNN) [8, 9] , fully convolutional network [10, 11] , generative adversarial network (GAN) [12] , and wavenet [13] , mostly perform  ...  With the advent of deep learning, recent studies suggest [17] an unified approach that combines speech enhancement and bandwidth extension (UEE) in a joint training neural network.  ... 
doi:10.21437/interspeech.2020-2022 dblp:conf/interspeech/HouXZC020 fatcat:g3ishlqknvfjfdzg2ymsjclvxe

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation [article]

Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen
2021 arXiv   pre-print
Since the visual aspect of speech is essentially unaffected by the acoustic environment, visual information from the target speakers, such as lip movements and facial expressions, has also been used for  ...  In addition, we review deep-learning-based methods for speech reconstruction from silent videos and audio-visual sound source separation for non-speech signals, since these methods can be more or less  ...  pure neural-network-based methods.  ... 
arXiv:2008.09586v2 fatcat:vgdadayysvazfna32f5s43nc6e

Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification [article]

Sung Hwan Mun, Min Hyun Han, Dongjune Lee, Jihwan Kim, Nam Soo Kim
2021 arXiv   pre-print
Also, we demonstrate that the integrated two-stage framework further improves the speaker verification performance on the VoxCeleb1 test set in terms of EER and MinDCF.  ...  In the back-end stage, the probabilistic speaker embeddings are estimated by maximizing the mutual likelihood score between the speech samples belonging to the same speaker, which provide not only speaker  ...  ., “Front-end factor analysis for speaker verification,” IEEE neural networks toward unsupervised learning of speaker characteristics,” Transactions on Audio, Speech, and Language Processing  ... 
arXiv:2112.08929v1 fatcat:cm4plnaw2ngtnk23s5pq3cmjhe

Preserving background sound in noise-robust voice conversion via multi-task learning [article]

Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie
2022 arXiv   pre-print
The critical problem for preserving background sound in VC is inevitable speech distortion by the neural separation model and the cascade mismatch between the source separation model and the VC model.  ...  Experimental results demonstrate that our proposed framework significantly outperforms the baseline systems while achieving comparable quality and speaker similarity to the VC models trained with clean  ...  verification (ASV) [4] .  ... 
arXiv:2211.03036v1 fatcat:3ym2e4dy4vdthagtsdfjotvqtq

Deep Learning Approach in DOA Estimation: A Systematic Literature Review

Shengguo Ge, Kuo Li, Siti Nurulain Binti Mohd Rum, Sang-Bing Tsai
2021 Mobile Information Systems  
This study provides a systematic review of research on DOA estimation using deep neural network methods.  ...  Then, the DL technology used in DOA estimation is systematically analyzed, including the purpose of using DL in DOA estimation, various DL models (convolutional neural network, deep neural network, and  ...  Liu et al. [58] ULA DNN Linear - e network consists of two parts: one is a multitasking autoencoder and the other is a fully connected multilayer neural network.  ... 
doi:10.1155/2021/6392875 fatcat:jtmyuje6zff5bnonpui5qc2vym

MooseNet: A trainable metric for synthesized speech with plda backend [article]

Ondřej Plátek, Ondřej Dušek
2023 arXiv   pre-print
The first model is a Neural Network (NN). As a second model, we propose a PLDA generative model on the top layers of the first NN model, which improves the pure NN model.  ...  We report improvements to the challenge baselines using easy-to-use modeling techniques, which also scales for larger self-supervised learning (SSL) model. We present two models.  ...  PLDA A PLDA is a well-known classification generative probabilistic model in face recognition [14] and speaker verification [20] literature for its robust likelihood estimates.  ... 
arXiv:2301.07087v1 fatcat:uyaawlr7ajf37gbqjdecm6p24y

Deep Spoken Keyword Spotting: An Overview

Ivan Lopez-Espejo, Zheng-Hua Tan, John Hansen, Jesper Jensen
2021 IEEE Access  
Le, “Sequence to sequence learning robust keyword spotting and speaker verification using CTC-based soft with neural networks,” in Proceedings of NIPS 2014 –  ...  “Deep convolutional spiking neural networks for keyword spotting,” [34] S.  ... 
doi:10.1109/access.2021.3139508 fatcat:i4pfpfxcpretlkbefp7owtxcti

A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement [article]

Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris Smaragdis
2023 arXiv   pre-print
In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement.  ...  that the proposed unified model obtains promising results on both personalized and non-personalized speech enhancement benchmarks and reaches similar performance to models that are trained specialized for  ...  While advanced deep neural network architectures have achieved state-of-the-art in offline speech enhancement tasks [1, 2] , recent advances in speech enhancement have been focused on efficient model  ... 
arXiv:2302.11768v1 fatcat:tjg7mvrw75ctjej3r2wuvbrpom

Deep Learning for Distant Speech Recognition [article]

Mirco Ravanelli
2017 arXiv   pre-print
of deep neural networks.  ...  We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks.  ...  of deep neural networks.  ... 
arXiv:1712.06086v1 fatcat:2b7ymqmihjan5nkxeqrxq52wki

Selective Listening by Synchronizing Speech with Lips [article]

Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li
2022 arXiv   pre-print
We transfer the knowledge from the pre-trained model to the attractor encoder of the speaker extraction network.  ...  Therefore, we propose a self-supervised pre-training strategy, to exploit the speech-lip synchronization cue for target speaker extraction, which allows us to leverage abundant unlabeled in-domain data  ...  The target-interference SNR is defined as the energy contrast between the target speaker and the interference speaker in terms of SNR.  ... 
arXiv:2106.07150v2 fatcat:ay6bzvmy4je3pexqykiwksmu5e
« Previous Showing results 1 — 15 out of 42 results