Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.

We modify the input and output layer structures of the network to improve the performance. ... EMPHASIS is designed to be a multi-lingual model and can synthesize Mandarin-English speech for now. ... contextual information. ...

arXiv:1806.09276v2 fatcat:cpn4jihzbre2rc5kwmrjrqtp6m

Multiple Versions

We modify the input and output layer structures of the network to improve the performance. ... EMPHASIS is designed to be a multi-lingual model and can synthesize Mandarin-English speech for now. ... contextual information. ...

doi:10.21437/interspeech.2018-1511 dblp:conf/interspeech/LiKW18 fatcat:4mblxebp3bafll643uoe2zk2we

improve prosodic structure pedagogy in the context of L2 English pronunciation. ... Pedagogically, how should these findings inform classroom practices? ... TABLE 1 | 1 Proposed Prosodic Structure Pathway. ...

doi:10.3389/fcomm.2021.721053 fatcat:6qt4u4oslzefvmis5orpp26gxu

DOAJ Szczepanski

Then, the predicted punctuation and its punctuation confidence are combined with contextual linguistic features to predict the break type of the word boundary by an MLP (multi-layer perceptrons). ... A novel statistical linguistic feature, called punctuation confidence, is proposed in this paper for assisting in prosodic break prediction in Mandarin text-to-speech. ... In future works, it is worthwhile to incorporate the punctuation confidence with the durational information of prosodic units (i.e., PW, PPh, BG/PG) for further improvement on prosodic break prediction ...

doi:10.1109/icassp.2012.6288942 dblp:conf/icassp/ChiangWC12 fatcat:6uahfdkb3fal5f7c2lfrhtqpq4

text, and prosodic tags representing the prosodic structure of speech. ... Experimental results on a large Mandarin read speech corpus showed that the parameters of the SR-HPM together with these feature normalization functions interpreted the effects of speaking rate on Mandarin ... duration information to the proposed SR-controlled Mandarin TTS system. ...

doi:10.1109/taslp.2014.2321482 fatcat:pu43xyrqajddpeaqzlihu6nfyq

However, generating human-parity long-form speech with high dynamic prosodic variations is still challenging. ... Additionally, we perform hierarchical supervision from acoustic prosody on each node of the graph to capture the prosodic variations with a high dynamic range. ... contextual information. ...

arXiv:2309.13907v2 fatcat:yc6hohppabax5mz7pnyxzndy3u

Multiple Versions

detection with multi-task learning (MTL) to reinforce the performance of each other. ... In this paper, we define Intention Prominence (IP) as the semantic combination of focus by text and emphasis by speech, and propose a multi-task deep learning framework to predict IP. ... The prosodic units are from five levels: syllable, prosodic word, prosodic phrase, intonational phrase and sentence. ...

doi:10.1609/aaai.v31i1.10493 fatcat:wardjwxgwfdp3cdhfawpfb2u3i

Multiple linguistic features of syllables at different levels like positional and contextual features are used from text. ... A small speech database is considered as a preliminary work to predict syllable duration with proposed RNN algorithm. Experiments are conducted with different sets of features. ... A multi-level prosodic model based on the estimation of prosodic features is considered [25] . ...

doi:10.15680/ijircce.2015.0302017 fatcat:4eaf7gawwzalbgzhhbyqhowkpi

In this paper, we discuss Mandarin stress detection and compare it with English pitch accent detection. ... with the baseline system. ... Acknowledgements The authors are thankful to the anonymous reviewers for their valuable comments and corrections in an earlier version of our manuscript, which contributed to the significant improvement ...

doi:10.1016/j.csl.2011.09.002 fatcat:7ucw4xvo7jayhpu6w64dp5d5hu

In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. ... With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. ... Improving non-native word-level pronunciation scoring with phone-level mixup data augmentation and multi-source information. arXiv preprint arXiv:2203.01826. David Abercrombie. 1949. ...

arXiv:2310.13974v1 fatcat:xn5kts2msjg6do2c36vcrndt5q

Open Access

[154] modeled prosody information at all levels of the text in the way of multi-task learning, and proposed a Mandarin prosodic boundary prediction model based on BLSTM-CRF, which improved the prediction ... Taking Mandarin as an example, the prosodic structure of Mandarin is a three-level hierarchical structure composed of three basic units: prosodic words (PW), prosodic phrases (PPH) and intonation phrases ...

arXiv:2104.09995v1 fatcat:q5lx74ycx5hobjox4ktl3amfta

Open Access

So the proposed prosodic modeling method is promising for Mandarin speech recognition. Ó ... In this paper, a recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed. ... One is that it adopts word boundary information as the output targets to be modeled instead of the conventional multi-level prosodic marks, such as the TOBI system (Grice et al., 1996; Silverman et al ...

doi:10.1016/s0167-6393(01)00006-1 fatcat:ig7ehvd5xfhs5oecglcn25qq2q

to label the prosodic information of text and corresponding speech, thereby improving the naturalness and intelligibility of low-resource Mongolian language. ... Improvements are made using pre-trained VITS model and transfer learning methods. b) In view of the problem of less labeled information, this paper proposes to use an automatic prosodic annotation method ... Word-level prosodic information and contextual information are mapped to character-level through Length Regulator. Finally, all the information at the character level is input into the VITS model. ...

arXiv:2211.09365v2 fatcat:knbxiozuarerrblyrskdkvtabm

Multiple Versions

When compared with the baseline system, the performance of our proposed mixed speech recognition system improves the correct rate of tonal syllable significantly. ... In this paper, we also utilize tone model to improve the correct rate of tonal syllable through revising the tone of the tonal syllable at certain significant level. ... Second, prosody information can improve the performance of speech recognition. ...

doi:10.1109/ijcnn.2011.6033221 dblp:conf/ijcnn/NiLX11 fatcat:dk4ep7yr4zc2tlacddlt3xuvtu

The cross-sentence contextual information, such as break and prosodic variations between consecutive sentences, can be better predicted and rendered than the sentence-based model. ... The information in a paragraph is captured by encoders and the inter-sentence information in a paragraph is learned with multi-head attention mechanisms. ... Evidence has shown that paragraph-level or discourse-level prosody improves the naturalness and expressiveness of SPSS [31, 32] , but annotations of discourse structure or prosodic properties are required ...

arXiv:2209.06484v1 fatcat:aczotcxs5fa6pnp2vopv3ifteq

EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System [article]

Preserved Fulltext

Other Versions

EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

Preserved Fulltext

Commentary: "Vowel Quality and Direction of Stress Shift in a Predictive Model Explaining the Varying Impact of Misplaced Word Stress: Evidence From English" and "Exploring the Complexity of the L2 Intonation System: An Acoustic and Eye-Tracking Study"

Preserved Fulltext

Punctuation generation inspired linguistic features for mandarin prosodic boundary prediction

Preserved Fulltext

Modeling of Speaking Rate Influences on Mandarin Speech Prosody and Its Application to Speaking Rate-controlled TTS

Preserved Fulltext

HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS [article]

Preserved Fulltext

Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems

Preserved Fulltext

Duration Modeling For Telugu Language with Recurrent Neural Network english

Preserved Fulltext

From English pitch accent detection to Mandarin stress detection, where is the difference?

Preserved Fulltext

Automatic Pronunciation Assessment – A Review [article]

Preserved Fulltext

Review of end-to-end speech synthesis technology based on deep learning [article]

Preserved Fulltext

RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion

Preserved Fulltext

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation [article]

Preserved Fulltext

Other Versions

Prosody dependent Mandarin speech recognition

Preserved Fulltext

ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS [article]

Preserved Fulltext

Duration Modeling For Telugu Language with Recurrent Neural Network
english