A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
[article]
2018
arXiv
pre-print
We modify the input and output layer structures of the network to improve the performance. ...
EMPHASIS is designed to be a multi-lingual model and can synthesize Mandarin-English speech for now. ...
contextual information. ...
arXiv:1806.09276v2
fatcat:cpn4jihzbre2rc5kwmrjrqtp6m
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
2018
Interspeech 2018
We modify the input and output layer structures of the network to improve the performance. ...
EMPHASIS is designed to be a multi-lingual model and can synthesize Mandarin-English speech for now. ...
contextual information. ...
doi:10.21437/interspeech.2018-1511
dblp:conf/interspeech/LiKW18
fatcat:4mblxebp3bafll643uoe2zk2we
Commentary: "Vowel Quality and Direction of Stress Shift in a Predictive Model Explaining the Varying Impact of Misplaced Word Stress: Evidence From English" and "Exploring the Complexity of the L2 Intonation System: An Acoustic and Eye-Tracking Study"
2021
Frontiers in Communication
improve prosodic structure pedagogy in the context of L2 English pronunciation. ...
Pedagogically, how should these findings inform classroom practices? ...
TABLE 1 | 1 Proposed Prosodic Structure Pathway. ...
doi:10.3389/fcomm.2021.721053
fatcat:6qt4u4oslzefvmis5orpp26gxu
Punctuation generation inspired linguistic features for mandarin prosodic boundary prediction
2012
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Then, the predicted punctuation and its punctuation confidence are combined with contextual linguistic features to predict the break type of the word boundary by an MLP (multi-layer perceptrons). ...
A novel statistical linguistic feature, called punctuation confidence, is proposed in this paper for assisting in prosodic break prediction in Mandarin text-to-speech. ...
In future works, it is worthwhile to incorporate the punctuation confidence with the durational information of prosodic units (i.e., PW, PPh, BG/PG) for further improvement on prosodic break prediction ...
doi:10.1109/icassp.2012.6288942
dblp:conf/icassp/ChiangWC12
fatcat:6uahfdkb3fal5f7c2lfrhtqpq4
Modeling of Speaking Rate Influences on Mandarin Speech Prosody and Its Application to Speaking Rate-controlled TTS
2014
IEEE/ACM Transactions on Audio Speech and Language Processing
text, and prosodic tags representing the prosodic structure of speech. ...
Experimental results on a large Mandarin read speech corpus showed that the parameters of the SR-HPM together with these feature normalization functions interpreted the effects of speaking rate on Mandarin ...
duration information to the proposed SR-controlled Mandarin TTS system. ...
doi:10.1109/taslp.2014.2321482
fatcat:pu43xyrqajddpeaqzlihu6nfyq
HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS
[article]
2023
arXiv
pre-print
However, generating human-parity long-form speech with high dynamic prosodic variations is still challenging. ...
Additionally, we perform hierarchical supervision from acoustic prosody on each node of the graph to capture the prosodic variations with a high dynamic range. ...
contextual information. ...
arXiv:2309.13907v2
fatcat:yc6hohppabax5mz7pnyxzndy3u
Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems
2017
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
detection with multi-task learning (MTL) to reinforce the performance of each other. ...
In this paper, we define Intention Prominence (IP) as the semantic combination of focus by text and emphasis by speech, and propose a multi-task deep learning framework to predict IP. ...
The prosodic units are from five levels: syllable, prosodic word, prosodic phrase, intonational phrase and sentence. ...
doi:10.1609/aaai.v31i1.10493
fatcat:wardjwxgwfdp3cdhfawpfb2u3i
Duration Modeling For Telugu Language with Recurrent Neural Network
english
2015
International Journal of Innovative Research in Computer and Communication Engineering
english
Multiple linguistic features of syllables at different levels like positional and contextual features are used from text. ...
A small speech database is considered as a preliminary work to predict syllable duration with proposed RNN algorithm. Experiments are conducted with different sets of features. ...
A multi-level prosodic model based on the estimation of prosodic features is considered [25] . ...
doi:10.15680/ijircce.2015.0302017
fatcat:4eaf7gawwzalbgzhhbyqhowkpi
From English pitch accent detection to Mandarin stress detection, where is the difference?
2012
Computer Speech and Language
In this paper, we discuss Mandarin stress detection and compare it with English pitch accent detection. ...
with the baseline system. ...
Acknowledgements The authors are thankful to the anonymous reviewers for their valuable comments and corrections in an earlier version of our manuscript, which contributed to the significant improvement ...
doi:10.1016/j.csl.2011.09.002
fatcat:7ucw4xvo7jayhpu6w64dp5d5hu
Automatic Pronunciation Assessment – A Review
[article]
2023
arXiv
pre-print
In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. ...
With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. ...
Improving non-native word-level pronunciation scoring with phone-level mixup data augmentation and multi-source information. arXiv preprint arXiv:2203.01826. David Abercrombie. 1949. ...
arXiv:2310.13974v1
fatcat:xn5kts2msjg6do2c36vcrndt5q
Review of end-to-end speech synthesis technology based on deep learning
[article]
2021
arXiv
pre-print
[154] modeled prosody information at all levels of the text in the way of multi-task learning, and proposed a Mandarin prosodic boundary prediction model based on BLSTM-CRF, which improved the prediction ...
Taking Mandarin as an example, the prosodic structure of Mandarin is a three-level hierarchical structure composed of three basic units: prosodic words (PW), prosodic phrases (PPH) and intonation phrases ...
arXiv:2104.09995v1
fatcat:q5lx74ycx5hobjox4ktl3amfta
RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion
2002
Speech Communication
So the proposed prosodic modeling method is promising for Mandarin speech recognition. Ó ...
In this paper, a recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed. ...
One is that it adopts word boundary information as the output targets to be modeled instead of the conventional multi-level prosodic marks, such as the TOBI system (Grice et al., 1996; Silverman et al ...
doi:10.1016/s0167-6393(01)00006-1
fatcat:ig7ehvd5xfhs5oecglcn25qq2q
Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation
[article]
2023
arXiv
pre-print
to label the prosodic information of text and corresponding speech, thereby improving the naturalness and intelligibility of low-resource Mongolian language. ...
Improvements are made using pre-trained VITS model and transfer learning methods. b) In view of the problem of less labeled information, this paper proposes to use an automatic prosodic annotation method ...
Word-level prosodic information and contextual information are mapped to character-level through Length Regulator. Finally, all the information at the character level is input into the VITS model. ...
arXiv:2211.09365v2
fatcat:knbxiozuarerrblyrskdkvtabm
Prosody dependent Mandarin speech recognition
2011
The 2011 International Joint Conference on Neural Networks
When compared with the baseline system, the performance of our proposed mixed speech recognition system improves the correct rate of tonal syllable significantly. ...
In this paper, we also utilize tone model to improve the correct rate of tonal syllable through revising the tone of the tonal syllable at certain significant level. ...
Second, prosody information can improve the performance of speech recognition. ...
doi:10.1109/ijcnn.2011.6033221
dblp:conf/ijcnn/NiLX11
fatcat:dk4ep7yr4zc2tlacddlt3xuvtu
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS
[article]
2022
arXiv
pre-print
The cross-sentence contextual information, such as break and prosodic variations between consecutive sentences, can be better predicted and rendered than the sentence-based model. ...
The information in a paragraph is captured by encoders and the inter-sentence information in a paragraph is learned with multi-head attention mechanisms. ...
Evidence has shown that paragraph-level or discourse-level prosody improves the naturalness and expressiveness of SPSS [31, 32] , but annotations of discourse structure or prosodic properties are required ...
arXiv:2209.06484v1
fatcat:aczotcxs5fa6pnp2vopv3ifteq
« Previous
Showing results 1 — 15 out of 535 results