Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction.

Neural beamformers, which integrate both pre-separation and beamforming modules, have demonstrated impressive effectiveness in target speech extraction. ... Initially, we employ the cross-attention mechanism in the time domain to extract crucial spatial information related to beamforming from the noisy covariance matrix. ... INTRODUCTION Neural beamformers have demonstrated exceptional capabilities in the realm of multi-channel target speech extraction [1] . ...

arXiv:2308.15990v2 fatcat:47v3t6d6j5d73eqllzzt2hz63y

Open Access Multiple Versions

Assuming an enrollment utterance of the target speakeris available, the so-called SpeakerBeam concept has been recently proposed to extract the target speaker from a speech mixture. ... In this contribution we investigate different approaches to exploit such spatial information. ... Computational resources were provided by the Paderborn Center for Parallel Computing. ...

doi:10.1007/978-3-030-31372-2_17 fatcat:36mscnxvnbhnzlqobfij5socoe

The spatial features and phase information of target speech are incorporated into the beamforming by neural network, and a neural network based single-channel postfiltering with the phase correction is ... With the development of microphone array signal processing technology and deep learning, the beamforming combined with neural network has provided a more diverse solution for this field. ... The After the beamforming with spatial and phase information, this post-filtering can effectively enhance the speech. ...

doi:10.21437/interspeech.2020-0990 dblp:conf/interspeech/ChengB20 fatcat:3hjxi5b4indyheyv7ly6ayxzzy

In this paper, we introduce spatial attention for refining the information in multi-direction neural beamformer for far-field automatic speech recognition. ... However, the features extracted by such methods contain redundant information, as only the direction of the target speech is relevant. ... The spatial attention, computed from multi-directional features, indicates how informative each direction is for recognizing the target speech. ...

arXiv:1911.02115v2 fatcat:pqwcc4stwzaa5iviudac57zu2u

Multiple Versions

Recently, frequency domain all-neural beamforming methods have achieved remarkable progress for multichannel speech separation. ... This study proposes a novel all-neural beamforming method in time domain and makes an attempt to unify the all-neural beamforming pipelines for time domain and frequency domain multichannel speech separation ... Although the formulation of conventional MCWF only involves the target speech estimation at the reference channel (Eq. 23), the spatial information of the target speech will be neglected during the all-neural ...

arXiv:2212.08348v2 fatcat:xearmdcnq5bcrnggtgofdhblke

Multiple Versions

This study proposes a novel all-neural approach for multichannel speech enhancement, where robust speaker localization, acoustic beamforming, post-filtering and spatial filtering are all done using deep ... Next, the directional features are combined with the spectral features extracted from the beamformed signal to achieve further enhancement. ... With multiple microphones, spatial information can be exploited to complement spectral information for better de-noising and dereverberation. ...

doi:10.21437/interspeech.2018-1664 dblp:conf/interspeech/WangW18a fatcat:5ovmvjayuzcszhuxljeuggsdmq

For EM, instead of estimating spatial covariance matrix explicitly, the 3-D embedding tensor is learned with the network, where both spectral and spatial discriminative information can be represented. ... The spatial covariance matrix has been considered to be significant for beamformers. ... INTRODUCTION Speech enhancement (SE) attempts to extract the target speech from the mixture signals. ...

arXiv:2109.00265v2 fatcat:vq4ozejaszfotcm6sy3o2zu2ym

Multiple Versions

Simulation results show that the proposed neural beamformer is effective in enhancing speech signals, with speech quality well preserved. ... In this study, a neural beamformer consisting of a beamformer and a novel multi-channel DCCRN is proposed for speech enhancement and source localization. ... INTRODUCTION The goal of speech enhancement is to extract the target speech from the noisy signal. ...

arXiv:2206.09728v1 fatcat:xssjul7bfbhbvdz4krezdkc55m

After that, the neural spatial filter is learned by simultaneously modeling the spatial and spectral discriminability of the speech and the interference, so as to extract the desired speech coarsely in ... To handle these problems, this paper designs a causal neural beam filter that fully exploits the spatial-spectral information in the beam domain. ... In this paper, we design a neural beam filter for real-time multichannel speech enhancement. ...

arXiv:2202.02500v1 fatcat:677rkbgysvhovo67pzaaxxyxje

In this study, a pre-separation and all-neural beamformer framework is proposed for multi-channel speech separation without following the solutions of the conventional beamformers, such as the minimum ... Furthermore, this method can be used for symmetrical stereo speech. ... Introduction Speech separation can extract target speaker information from speech signals corrupted by interference and reverberation, and it can improve the quality of communication between people. ...

doi:10.3390/sym15020261 fatcat:6cwujw7i6rdxdnvbeh3kzzddha

DOAJ Szczepanski

Then a neural network is trained using all features with a target of the clean speech of the required speaker. ... In the proposed system, three different features are formed for each target speaker, namely, spectral, spatial, and angle features. ... Beamforming utilizes the spatial information collected from multiple microphones to enhance the target speech, while the neural networks learn the regularities in speech magnitude spectra to separate speakers ...

doi:10.1109/slt.2018.8639593 dblp:conf/slt/ChenXYELG18 fatcat:lwfz7dkatzejhmc72uhxj4aufy

This paper deals with multi-channel speech recognition in scenarios with multiple speakers. ... In this work we present two variants of speakeraware neural networks, which exploit both spectral and spatial information to allow better discrimination between target and interfering speakers. ... In this work we propose two novel multi-channel low-latency speech extraction systems, which retrieve spatial and spectral information from an AU to force a neural network to focus on the speech signal ...

doi:10.21437/interspeech.2019-2244 dblp:conf/interspeech/Martin-DonasHHG19 fatcat:pglsy45aera5tn6s2ulx26r5xm

We divide the scene into two spatial regions containing, respectively, the target and the interfering sound sources. ... We evaluate the proposed model on a real-world dataset and show that the model matches the performance of an oracle beamformer followed by a state-of-the-art single-channel enhancement network. ... For example, enhancement post-filter that follows a beamformer does not have access to spatial information that's lost after spatial filtering. ...

arXiv:2206.15423v1 fatcat:c3gloch46zemrnfxianoa62laq

direction, for target speaker separation. ... In this paper, integrated with the power spectra and inter-channel spatial features at the input level, we explore to leverage directional features, which imply the speaker source from the desired target ... With the direction of arrival (DOA) information, beamforming techniques [13, 14] can be applied to enhance the speaker from the desired direction. ...

doi:10.21437/interspeech.2019-2266 dblp:conf/interspeech/GuCZZXYSZ019 fatcat:ebrxte7o2fhvzdoybevt57dpvm

The key advantage of using multiple microphones for speech enhancement is that spatial filtering can be used to complement the tempo-spectral processing. ... However, the internal mechanisms that lead to good performance of such data-driven filters for multi-channel speech enhancement are not well understood. ... Berger and Rohde&Schwarz SwissQual AG for their support with POLQA. ...

arXiv:2206.13310v2 fatcat:bneiifs4xnbofokutyyxhhlueu

Multiple Versions

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction [article]

Preserved Fulltext

A Study on Online Source Extraction in the Presence of Changing Speaker Positions [chapter]

Preserved Fulltext

Speech Enhancement Based on Beamforming and Post-Filtering by Combining Phase Information

Preserved Fulltext

Spatial Attention for Far-field Speech Recognition with Deep Beamforming Neural Networks [article]

Preserved Fulltext

Other Versions

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation [article]

Preserved Fulltext

Other Versions

All-Neural Multi-Channel Speech Enhancement

Preserved Fulltext

Embedding and Beamforming: All-neural Causal Beamformer for Multichannel Speech Enhancement [article]

Preserved Fulltext

Other Versions

Multi-channel end-to-end neural network for speech enhancement, source localization, and voice activity detection [article]

Preserved Fulltext

A Neural Beam Filter for Real-time Multi-channel Speech Enhancement [article]

Preserved Fulltext

A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation

Preserved Fulltext

Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network

Preserved Fulltext

Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation

Preserved Fulltext

Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain [article]

Preserved Fulltext

Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information

Preserved Fulltext

Insights into Deep Non-linear Filters for Improved Multi-channel Speech Enhancement [article]

Preserved Fulltext

Other Versions