Method of Estimating Signal-to-Noise Ratio Based on Optimal Design for Sub-band Voice Activity Detection.

Here, we propose a gSNR estimation framework that mainly consists of sub-band processing, voice activity detection (VAD), and threshold optimization. ... The global signal to noise ratio (gSNR) is the ratio of concurrent powers between speech and noise in a noisy speech signal. ... We designed power-level-based VAD to detect speech and noise periods in each sub-band. ...

dblp:journals/jihmsp/MoritaLUA17 fatcat:fbdwdfvf5jgvpplqwm4axtpksu

The proposed algorithm could apply to speech-based systems. Index Terms: voice activity detection, sub-band temporal envelope, sub-band long-term signal variability, fusion decision ... Voice activity detection (VAD) is widely used for various speech-based systems which is an important pre-processing step. This paper proposes a robust voice activity detection algorithm. ... However, developing a VAD for noisy environments with low signal-to-noise ratios (SNR) or for any non-stationary noise is still very challenging. Many methods have been proposed for speech detection. ...

doi:10.1109/iscslp.2014.6936602 dblp:conf/iscslp/LiuTMLWL14 fatcat:mhwzjpalz5cjbobbke6zso3zf4

In this paper, a speech signal recovery algorithm is presented for a personalized voice command automatic recognition system in vehicle and restaurant environments. ... This novel algorithm is able to separate a mixed speech source from multiple speakers, detect presence/absence of speakers by tracking the higher magnitude portion of speech power spectrum and adaptively ... Acknowledgements The authors would like to thank STMicroelectronics Asia Pacific Pte Ltd for providing speech dataset and experiment environment. ...

doi:10.4236/jcc.2014.29015 fatcat:jgnbw3fidvdxjey24wgxj3ahnm

Open Access

Keywords Voice activity detection · Perceptual wavelet packet transform · Noise classification · Support vector machine Introduction Voice activity detection (VAD) is a process, which can detect speech ... A multiclass SVM is also used to classify background noises in order to select SVM model for VAD. ... Taking into account more noise types in the proposed VAD can improve the performance in real-world applications. Future work should be done on these promising issues. ...

doi:10.1007/s11760-013-0479-5 fatcat:hvikpaxptnfixg6pnkum3q5jta

The proposed ALT-SubEnpy-based VAD method is shown to be an effective method while working at variable noise-level condition. key words: voice activity detection, long-term spectral analysis, sub-band ... Here we propose a novel concept of adaptive long-term sub-band entropy (ALT-SubEnpy) measure and combine it with a multi-thresholding scheme for voice activity detection. ... for detecting speech with a low signal-to-noise ratio (SNR). ...

doi:10.1587/transinf.e95.d.2732 fatcat:rigvac4knvf5nk2rrjq3eu7sde

To localize the speaker an algorithm based on Steered Response Power by utilizing harmonic structures of speech signal is proposed. ... This new scheme has the ability of speaker verification by fundamental frequency variation; therefore it can be utilized in the design of a speech recognition system for verified speakers. ... These sub-bands are summarized in table 1. Voice activity detection Speaker localization performance is improved by the detection and removal of non-speech frames from the localization process. ...

doi:10.1109/aiccsa.2008.4493578 dblp:conf/aiccsa/AsaeiTS08 fatcat:y3lckuw6prastcdvjn3iedev5q

In this study, an adaptive spectral subtraction algorithm is implemented using the noise-estimation algorithm for highly non-stationary noisy environments instead of the voice activity detection (VAD) ... Also, signal to residual spectrum ratio (SR) is implemented in order to control the amplification distortion for speech intelligibility improvement. ... Implementation of SS requires the estimation of the spectral magnitude of the received noisy signal and the noise spectrum for regions that are considered as "noise-only" using voice activity detection ...

doi:10.22385/jctecs.v1i0.15 fatcat:gf6xjgfpznhe3k2vrzekm2zmkm

In this paper, a simple method of noise estimation employing the use of a voice activity detector is proposed. ... On the contrary, the WCT tends to increase in lower-band if the speech is categorized as unvoiced. ... On the other hand, the method based on higher-order statistics has been proposed to utilize the statistics of speech signals for VAD [32] . ...

doi:10.1587/transinf.e93.d.341 fatcat:6ldodptxbzcrvhhgej6sr4qsqq

Important signal characteristics of elephant vocalisations were identified from spectrograms and a technique, based on the principles of existing voice activity detection algorithms, was developed to exploit ... Manual analysis of recordings (by listening to these and by visual inspection of spectrograms) to locate vocalisations is tedious. The automatic detection of vocalisations in recordings is explored. ... For signal in detection in noise, a matched filter is optimal. ...

doi:10.1016/j.biosystemseng.2010.04.001 fatcat:wtpcztrp7rdgrokicdpugkhc4m

This paper focuses on voice activity detection, noise estimation, removal techniques and an optimal filter. ... With such a formulation, the core issue of noise reduction becomes how to design an optimal filter that can significantly suppress noise without noticeable speech distortion. ... The simplest approach is to estimate and update the noise spectrum during the silent (pauses) segments of the signal using a voice-activity detection (VAD) [4] . ...

doi:10.1007/978-3-642-16327-2_40 fatcat:i47sokh7zbeazjf7tfyuqjprc4

BACKGROUND: The conventional methods of speech enhancement, noise reduction, and voice activity detection are based on the suppression of noise or non-speech components of the target air-conduction signals ... METHODS: A new algorithm for speech detection and noise reduction is proposed, which makes use of the Shannon entropy principle and cross-correlation with the bone-conduction speech signals to threshold ... The traditional speech enhancement, noise reduction, and voice activity detection algorithms are based on linear processing techniques and general air-conduction speech signals [2] . ...

doi:10.3233/thc-174615 pmid:29710756 pmcid:PMC6004965 fatcat:wht7guphsjhbjd2xf3l4xgoe6u

First an amplified voice activity detector is designed to improve performance on SNR lower than 5dB. Then an adaptive threshold decision module based on fuzzy inference system is proposed. ... In this fuzzy inference system overall relations between speech and noise are summarized into seven fuzzy rules and four linguistic variables, which are used to detect the state of signals. ... This amplified voice activity detector employs the full wavelet packet transform (instead of the perceptual wavelet packet transform) to decompose the input speech signal into critical sub-band signals ...

doi:10.1007/978-3-642-12179-1_42 fatcat:s3sqmtr22fganc725eeuu35pvq

An amplified voice activity detector in the proposed hybrid filter is designed to improve performance when the signal-to-noise ratio (SNR) is lower than 5 dB. ... This paper proposes an adaptive fuzzy wavelet filter that is based on a fuzzy inference system for enhancing speech signals and improving the accuracy of speech recognition. ... Amplified Voice Activity Detection Voice activity detection (VAD) is used to distinguish speech from contaminated speech signals and is required in various speech communication systems [16] . ...

doi:10.4304/jcp.9.11.2501-2513 fatcat:aqltk4itxvbbnkzylqbe6jixr4

Open Access

The proposed TD-PBEE-based VAD algorithm is evaluated for four types of noises and five signal-to-noise ratio (SNR) levels. ... In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD). ... Conflicts of Interest The authors declare no conflict of interest. ...

doi:10.3390/s131216533 pmid:24316566 pmcid:PMC3892860 fatcat:ldaxnvwsmze6vifx5pydvyxjgy

DOAJ

., +, TASLP 2020 2349-2363 Correlation methods Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach. ... Beit-On, H., +, TASLP 2020 2184-2193 Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach. ... T Target tracking Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. Zhang, Q., +, TASLP 2020 1183 -1197 ...

doi:10.1109/taslp.2021.3055391 fatcat:7vmstynfqvaprgz6qy3ekinkt4

Method of Estimating Signal-to-Noise Ratio Based on Optimal Design for Sub-band Voice Activity Detection

Preserved Fulltext

Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability

Preserved Fulltext

Speech Signal Recovery Based on Source Separation and Noise Suppression

Preserved Fulltext

Robust voice activity detection directed by noise classification

Preserved Fulltext

A Novel Approach Based on Adaptive Long-Term Sub-Band Entropy and Multi-Thresholding Scheme for Detecting Speech Signal

Preserved Fulltext

Far-field continuous speech recognition system based on speaker Localization and sub-band Beamforming

Preserved Fulltext

Performance Improvement of Digital Hearing Aid Systems

Preserved Fulltext

An Adaptive Wavelet-Based Denoising Algorithm for Enhancing Speech in Non-stationary Noise Environment

Preserved Fulltext

Automatic detection of African elephant (Loxodonta africana) infrasonic vocalisations from recordings

Preserved Fulltext

Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment [chapter]

Preserved Fulltext

Noise reduction algorithm with the soft thresholding based on the Shannon entropy and bone-conduction speech cross- correlation bands

Preserved Fulltext

Adaptive Fuzzy Filter for Speech Enhancement [chapter]

Preserved Fulltext

On the Use of Adaptive Fuzzy Wavelet Filter in the Speech Enhancement

Preserved Fulltext

A Novel Voice Sensor for the Detection of Speech Signals

Preserved Fulltext

2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28

Preserved Fulltext