Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

535 Hits in 8.3 sec

EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System [article]

Hao Li, Yongguo Kang, Zhenyu Wang
2018 arXiv   pre-print
We modify the input and output layer structures of the network to improve the performance.  ...  EMPHASIS is designed to be a multi-lingual model and can synthesize Mandarin-English speech for now.  ...  contextual information.  ... 
arXiv:1806.09276v2 fatcat:cpn4jihzbre2rc5kwmrjrqtp6m

EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

Hao Li, Yongguo Kang, Zhenyu Wang
2018 Interspeech 2018  
We modify the input and output layer structures of the network to improve the performance.  ...  EMPHASIS is designed to be a multi-lingual model and can synthesize Mandarin-English speech for now.  ...  contextual information.  ... 
doi:10.21437/interspeech.2018-1511 dblp:conf/interspeech/LiKW18 fatcat:4mblxebp3bafll643uoe2zk2we

Commentary: "Vowel Quality and Direction of Stress Shift in a Predictive Model Explaining the Varying Impact of Misplaced Word Stress: Evidence From English" and "Exploring the Complexity of the L2 Intonation System: An Acoustic and Eye-Tracking Study"

Alison McGregor
2021 Frontiers in Communication  
improve prosodic structure pedagogy in the context of L2 English pronunciation.  ...  Pedagogically, how should these findings inform classroom practices?  ...  TABLE 1 | 1 Proposed Prosodic Structure Pathway.  ... 
doi:10.3389/fcomm.2021.721053 fatcat:6qt4u4oslzefvmis5orpp26gxu

Punctuation generation inspired linguistic features for mandarin prosodic boundary prediction

Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Then, the predicted punctuation and its punctuation confidence are combined with contextual linguistic features to predict the break type of the word boundary by an MLP (multi-layer perceptrons).  ...  A novel statistical linguistic feature, called punctuation confidence, is proposed in this paper for assisting in prosodic break prediction in Mandarin text-to-speech.  ...  In future works, it is worthwhile to incorporate the punctuation confidence with the durational information of prosodic units (i.e., PW, PPh, BG/PG) for further improvement on prosodic break prediction  ... 
doi:10.1109/icassp.2012.6288942 dblp:conf/icassp/ChiangWC12 fatcat:6uahfdkb3fal5f7c2lfrhtqpq4

Modeling of Speaking Rate Influences on Mandarin Speech Prosody and Its Application to Speaking Rate-controlled TTS

Sin-Horng Chen, Chiao-Hua Hsieh, Chen-Yu Chiang, Hsi-Chun Hsiao, Yih-Ru Wang, Yuan-Fu Liao, Hsiu-Min Yu
2014 IEEE/ACM Transactions on Audio Speech and Language Processing  
text, and prosodic tags representing the prosodic structure of speech.  ...  Experimental results on a large Mandarin read speech corpus showed that the parameters of the SR-HPM together with these feature normalization functions interpreted the effects of speaking rate on Mandarin  ...  duration information to the proposed SR-controlled Mandarin TTS system.  ... 
doi:10.1109/taslp.2014.2321482 fatcat:pu43xyrqajddpeaqzlihu6nfyq

HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS [article]

Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie
2023 arXiv   pre-print
However, generating human-parity long-form speech with high dynamic prosodic variations is still challenging.  ...  Additionally, we perform hierarchical supervision from acoustic prosody on each node of the graph to capture the prosodic variations with a high dynamic range.  ...  contextual information.  ... 
arXiv:2309.13907v2 fatcat:yc6hohppabax5mz7pnyxzndy3u

Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems

Yishuang Ning, Jia Jia, Zhiyong Wu, Runnan Li, Yongsheng An, Yanfeng Wang, Helen Meng
detection with multi-task learning (MTL) to reinforce the performance of each other.  ...  In this paper, we define Intention Prominence (IP) as the semantic combination of focus by text and emphasis by speech, and propose a multi-task deep learning framework to predict IP.  ...  The prosodic units are from five levels: syllable, prosodic word, prosodic phrase, intonational phrase and sentence.  ... 
doi:10.1609/aaai.v31i1.10493 fatcat:wardjwxgwfdp3cdhfawpfb2u3i

Duration Modeling For Telugu Language with Recurrent Neural Network

V.S.Ramesh Bonda, P.N.Girija
2015 International Journal of Innovative Research in Computer and Communication Engineering  
Multiple linguistic features of syllables at different levels like positional and contextual features are used from text.  ...  A small speech database is considered as a preliminary work to predict syllable duration with proposed RNN algorithm. Experiments are conducted with different sets of features.  ...  A multi-level prosodic model based on the estimation of prosodic features is considered [25] .  ... 
doi:10.15680/ijircce.2015.0302017 fatcat:4eaf7gawwzalbgzhhbyqhowkpi

From English pitch accent detection to Mandarin stress detection, where is the difference?

Chongjia Ni, Wenju Liu, Bo Xu
2012 Computer Speech and Language  
In this paper, we discuss Mandarin stress detection and compare it with English pitch accent detection.  ...  with the baseline system.  ...  Acknowledgements The authors are thankful to the anonymous reviewers for their valuable comments and corrections in an earlier version of our manuscript, which contributed to the significant improvement  ... 
doi:10.1016/j.csl.2011.09.002 fatcat:7ucw4xvo7jayhpu6w64dp5d5hu

Automatic Pronunciation Assessment – A Review [article]

Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury
2023 arXiv   pre-print
In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic.  ...  With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review.  ...  Improving non-native word-level pronunciation scoring with phone-level mixup data augmentation and multi-source information. arXiv preprint arXiv:2203.01826. David Abercrombie. 1949.  ... 
arXiv:2310.13974v1 fatcat:xn5kts2msjg6do2c36vcrndt5q

Review of end-to-end speech synthesis technology based on deep learning [article]

Zhaoxi Mu, Xinyu Yang, Yizhuo Dong
2021 arXiv   pre-print
[154] modeled prosody information at all levels of the text in the way of multi-task learning, and proposed a Mandarin prosodic boundary prediction model based on BLSTM-CRF, which improved the prediction  ...  Taking Mandarin as an example, the prosodic structure of Mandarin is a three-level hierarchical structure composed of three basic units: prosodic words (PW), prosodic phrases (PPH) and intonation phrases  ... 
arXiv:2104.09995v1 fatcat:q5lx74ycx5hobjox4ktl3amfta

RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion

Wern-Jun Wang, Yuan-Fu Liao, Sin-Horng Chen
2002 Speech Communication  
So the proposed prosodic modeling method is promising for Mandarin speech recognition. Ó  ...  In this paper, a recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed.  ...  One is that it adopts word boundary information as the output targets to be modeled instead of the conventional multi-level prosodic marks, such as the TOBI system (Grice et al., 1996; Silverman et al  ... 
doi:10.1016/s0167-6393(01)00006-1 fatcat:ig7ehvd5xfhs5oecglcn25qq2q

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation [article]

Xin Yuan, Robin Feng, Mingming Ye
2023 arXiv   pre-print
to label the prosodic information of text and corresponding speech, thereby improving the naturalness and intelligibility of low-resource Mongolian language.  ...  Improvements are made using pre-trained VITS model and transfer learning methods. b) In view of the problem of less labeled information, this paper proposes to use an automatic prosodic annotation method  ...  Word-level prosodic information and contextual information are mapped to character-level through Length Regulator. Finally, all the information at the character level is input into the VITS model.  ... 
arXiv:2211.09365v2 fatcat:knbxiozuarerrblyrskdkvtabm

Prosody dependent Mandarin speech recognition

Chong-Jia Ni, Wen-Ju Liu, Bo Xu
2011 The 2011 International Joint Conference on Neural Networks  
When compared with the baseline system, the performance of our proposed mixed speech recognition system improves the correct rate of tonal syllable significantly.  ...  In this paper, we also utilize tone model to improve the correct rate of tonal syllable through revising the tone of the tonal syllable at certain significant level.  ...  Second, prosody information can improve the performance of speech recognition.  ... 
doi:10.1109/ijcnn.2011.6033221 dblp:conf/ijcnn/NiLX11 fatcat:dk4ep7yr4zc2tlacddlt3xuvtu

ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS [article]

Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie
2022 arXiv   pre-print
The cross-sentence contextual information, such as break and prosodic variations between consecutive sentences, can be better predicted and rendered than the sentence-based model.  ...  The information in a paragraph is captured by encoders and the inter-sentence information in a paragraph is learned with multi-head attention mechanisms.  ...  Evidence has shown that paragraph-level or discourse-level prosody improves the naturalness and expressiveness of SPSS [31, 32] , but annotations of discourse structure or prosodic properties are required  ... 
arXiv:2209.06484v1 fatcat:aczotcxs5fa6pnp2vopv3ifteq
« Previous Showing results 1 — 15 out of 535 results