Fast Gated Recurrent Network for Speech Synthesis.

This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). ... The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. ... This paper is organized as follows: in Sect. 2, we give the background for speech synthesis and the MGU. In Sect. 3, we propose a fast gated recurrent network. ...

doi:10.1587/transinf.2021edl8032 fatcat:doq5zmojj5hqtayjl4pdpmyu5e

We adopt a simple recurrent unit (SRU) for the proposed model to achieve a recurrent architecture, in which we can execute fast speech parameter generation by using the high parallelization nature of SRU ... This paper presents a deep Gaussian process (DGP) model with a recurrent architecture for speech sequence modeling. ... On the other hand, NN-based speech synthesis studies have shown the effectiveness of utterancelevel modeling using recurrent NNs (RNNs) and attention-based networks [7, 8] . ...

arXiv:2004.10823v1 fatcat:dikiv5fuk5ddxjmobcose2pvk4

This learns both the weights and the compact architecture of H-LSTM control gates. We have GP-trained H-LSTMs for image captioning and speech recognition applications. ... Thus, GP-trained H-LSTMs can be seen to be compact, fast, and accurate. ... Introduction Recurrent neural networks (RNNs) have been ubiquitously employed for sequential data modeling because of their ability to carry information through recurrent cycles. ...

arXiv:1805.11797v2 fatcat:jyd2u6o2kbfwvdtybqied2oe4a

Multiple Versions

In this work, we introduce a new methodology for neural speech vocoding based on generative adversarial networks (GANs). ... Classical parametric speech coding techniques provide a compact representation for speech signals. ... Generative Adversarial Networks (GANs) provide an alternative approach for very fast generation of realistic data samples [8] . ...

doi:10.21437/interspeech.2019-1195 dblp:conf/interspeech/MustafaBBSM19 fatcat:4yeskn5mwbeijkjnzmkd34m7ji

Neural networks with Auto-regressive structures, such as Recurrent Neural Networks (RNNs), have become the most appealing structures for acoustic modeling of parametric text to speech synthesis (TTS) in ... In this paper, we propose a U-shaped Fully-parallel Acoustic Neural Structure (UFANS), which is a deconvolutional alternative of RNNs for Statistical Parametric Speech Synthesis (SPSS). ... Variants like Long Short-Term Memory (LSTM) [5] , Gated Recurrent Unit (GRU) [6] and other RNN structures are now broadly used in text-to-speech [7] with very good records. ...

arXiv:1811.12208v1 fatcat:vodixdxg7fagtahwitd2c34viu

Experimental results indicate that recurrent model can produce more accurate predictions for acoustic-to-articulatory inversion than deep neural network having fixed-length context window. ... To solve the acoustic-to-articulatory inversion problem, this paper proposes a deep bidirectional long short term memory recurrent neural network and a deep recurrent mixture density network. ... In speech synthesis, articulatory features can be incorporated into the traditional speech synthesis method to modify the characteristics of the synthesized speech [2] . ...

doi:10.1109/icassp.2015.7178812 dblp:conf/icassp/LiuYWKMC15 fatcat:vwbdhyjeofezjhijwlf2fc4zmi

gated recurrent unit, and hybrid model). ... This article focuses on developing a system for high-quality synthesized and converted speech by addressing three fundamental principles. ... Consequently, the second goal of this paper is to build a deep learning-based acoustic model for speech synthesis using feedforward and recurrent neural network as an alternative to HMMs. ...

doi:10.1007/s11042-020-09783-9 fatcat:5we3ryq6arb4xdxiblymuqwqlu

Open Access

that is fed into recurrent neural network. ... [3, 7] Gated Activation and Residual Units In the non-linearity part of network structure, Oord et al applied a gated activation unit similar to the activation in LSTM. ...

doi:10.5120/ijca2021921019 fatcat:f5e4do6kczfjnpsf4ooocyyvri

The proposed system uses the WaveNet speech synthesis architecture, with dilated causal convolutional layers using previous values of the predicted articulatory trajectories conditioned on acoustic features ... This paper presents Articulatory-WaveNet, a new approach for acoustic-to-articulator inversion. ... Fast WaveNet caches previously computed information from the overlapping network states, called recurrent states, to eliminate redundant convolutions. ...

arXiv:2006.12594v1 fatcat:y3xq5czyhjbkvhr4ilbqfwhztu

Spectrogram and MFCC are used as features to be classified with Long Short Term Recurrent Neural Network (LSTM RNN). Two models are presented and compared. ... This paper presents an efficient approach for classification of speech signals as reverberant or not. The reverberation is a severe effect encountered in closed room. ... Long Short Term Memory Recurrent Neural Network Deep RNN has wide use in speech processing for its ability to label sequences, means that each input sequence is assigned to a certain class. ...

doi:10.21608/mjeer.2020.103754 fatcat:mgei345mgrh53b6pz2qwdnxpka

We demonstrate that WPUNNs can also generalize gated units in recurrent neural networks, yielding results comparable to LSTM networks. ... We present windowed product unit neural networks (WPUNNs), a simple method of leveraging product as a nonlinearity in a neural network. ... LSTM networks, in particular, have proven to be particularly powerful models for speech recognition [23] , language modeling [24] , text-to-speech synthesis [25] , and handwriting recognition and generation ...

arXiv:1810.08578v1 fatcat:rjk55htxnbbxxdqqrxa6gjzw7q

A number of problems that were considered too challenging just a few years ago can now be solved convincingly by deep neural networks. ... to a matter of data collection and labeling, we believe that many insights learned from 'pre-Deep Learning' works still apply and will be more valuable than ever in guiding the design of novel neural network ... ACKNOWLEDGEMENTS The first author would like to thank Irwin Sobel for pointers on the pioneering work at MIT, and Xiaonan Zhou for her work on many of the deep neural network results shown. ...

doi:10.1017/atsip.2018.6 fatcat:6iftrepekjdmjffcb5ouz42jke

DOAJ

In this work, we present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies, which themselves are the foundation of many production ML systems. ... Integer quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation, using reduced integer precision. ... Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. ...

arXiv:2101.05453v1 fatcat:zr5vqtgunjdsvgfgyjepjryl7e

As a result of the advent of deep neural networks, several novel ways for audio processing methods based on deep models have been presented. ... The goal of the project is to use a stacked Dual signal Transformation LSTM Network (DTLN) to combine both analysis and synthesis into one model. ... Long Short-Term Memory is the neural network that employs these gates (LSTM). ...

doi:10.22214/ijraset.2022.44109 fatcat:snqgeixzzraudnais2z6hz6hki

Open Access

Speech segment detection based on gated recurrent unit (GRU) recurrent neural networks for the Kurdish language was investigated in the present study. ... Identification of the phoneme boundaries using a GRU recurrent neural network was performed with six different classification algorithms for the C/V/S discrimination. ... Gated Recurrent Unit Recurrent Neural Networks The gated recurrent unit (GRU) represents a kind of recurrent neural network. ...

doi:10.3390/app10041273 fatcat:rll6wnkklzcxxhgtx2h6337xfy

DOAJ

Fast Gated Recurrent Network for Speech Synthesis

Preserved Fulltext

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit [article]

Preserved Fulltext

Grow and Prune Compact, Fast, and Accurate LSTMs [article]

Preserved Fulltext

Other Versions

Analysis by Adversarial Synthesis — A Novel Approach for Speech Vocoding

Preserved Fulltext

UFANS: U-shaped Fully-Parallel Acoustic Neural Structure For Statistical Parametric Speech Synthesis With 20X Faster [article]

Preserved Fulltext

A deep recurrent approach for acoustic-to-articulatory inversion

Preserved Fulltext

Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion

Preserved Fulltext

Google Duplex - A Big Leap in the Evolution of Artificial Intelligence

Preserved Fulltext

Articulatory-WaveNet: Autoregressive Model For Acoustic-to-Articulatory Inversion [article]

Preserved Fulltext

Quality Evaluation of Reverberant Speech Based on Deep Learning

Preserved Fulltext

Leveraging Product as an Activation Function in Deep Networks [article]

Preserved Fulltext

The artificial intelligence renaissance: deep learning and the road to human-Level machine intelligence

Preserved Fulltext

On the quantization of recurrent neural networks [article]

Preserved Fulltext

Background Noise Suppression in Audio File using LSTM Network

Preserved Fulltext

An Optimal Feature Parameter Set Based on Gated Recurrent Unit Recurrent Neural Networks for Speech Segment Detection

Preserved Fulltext