A two-dimensional cepstrum approach for the recognition of mandarin syllable initials.

The performance of the proposed method is evaluated on a freely available Mandarin Chinese speech corpus, AISHELL-1, and is shown to outperform the existing techniques in the literature in terms of tone ... The method works by converting the speech signal to a cepstrogram, extracting a sequence of cepstral features using a convolutional neural network, and predicting the underlying sequence of tones using ... A simpler approach is to use an end-to-end method with a sequence-level training criterion. In the next section, we describe a new method for tone recognition that makes use of these two solutions. ...

doi:10.21437/interspeech.2018-2293 dblp:conf/interspeech/LugoschT18 fatcat:qiynaigaynhotam67sfrm7ylly

In this paper, a feature set for Chinese Mandarin intonation is addressed. ... Our study is performed on a Mandarin question corpus, which contains large amount and various types of interrogative sentences. ... As a result, MFCC has been widely used for speech recognition. As in typical speech recognition we extracted a 13 dimensional MFCC feature vector, which consists of 12 MFCC and a normalized energy. ...

doi:10.1109/iscslp.2014.6936692 dblp:conf/iscslp/BaoLGTCL14 fatcat:ueuva6ys7jazxcb4kzvgubh2fa

In this study, a multi-view approach is proposed to incorporate discriminative feature representations which requires less annotation for non-native mispronunciation verification of Mandarin. ... The approach shows improvement over GOP-based approach by +11.23% and single-view approach by +1.47% in diagnostic accuracy for a mispronunciation verification task. ... We used 13-dimensional Mel Frequency Cepstrum Coefficient (MFCC) as acoustic features. The TDNN network consists of six hidden layers, each of which contains 850 units. ...

arXiv:2009.02573v2 fatcat:tqw5j5thwvcqnbusstyvqbybtu

Multiple Versions

Conventionally, there are 408 Mandarin base syllables, regardless of tones, which is composed of 21 INITIAL's and 38 FINAL's [9] . In this paper, a two-stage keyword recognition system is proposed. ... Subsyllable boundaries are then used to extract the FINAL parts of Mandarin syllables, which contain the prosodic information. ... Acknowledgment The authors would like to thank the National Science Council, the Republic of China, for financial support of this work under contract No. NSC86-2622-E-006-003. ...

doi:10.1109/icce.1998.678264 fatcat:ta5pvcuq4jdgvibqeogw6nhqte

In this paper a comprehensive review of the approaches used in identifying spoken languages and the methods used for extracting speech dependent features are presented. ... information services, such as checking into a hotel, arranging a meeting, or making travel arrangements, which are difficult actions for non native speakers. ... Yan Deng, Jia Liu [3] used two approaches ,i.e support Vector Machine(SVM) and Phonetic N-gram, experimenting with two different ways of using SVM in the token based system, parallel phoneme Recognition ...

doi:10.5120/ijca2016911052 fatcat:y64thevii5de3etykx7zi36qau

Previous work on DBN in the speech community mainly focuses on using the generatively pre-trained DBN to initialize a discriminative model for better acoustic modeling in speech recognition (SR). ... Subjective results also confirm the advantage of the spectrum from DBN, and the overall quality is comparable to that of context-independent HMM. ... Each syllable HMM has a left-to-right topology with 10 states. Initially, 416 mono-syllable HMMs are estimated as the seed for 1,364 tonal syllable HMMs. ...

doi:10.1109/icassp.2013.6639225 dblp:conf/icassp/KangQM13 fatcat:ddzugdgcnjd2fppkszjb3dc674

These sound scenarios can be a recreation of a real sound space or can be the representation of the idea of how a sound space or an element in it could sound. ... Sound designers, DSP Engineers, musicians, etc., design and make use of several tools that allow them to create a given sound scenario not inly for film, but also for TV, Video-Games, Art Installations ... Interdisciplinary diagram for the wah-wah effect. [19] Figure No. 4 Activity-Valence two-dimensional space [24] Figure 5 . 5 Figure 5. Two dimensional emotion plane. ...

doi:10.5281/zenodo.3702732 fatcat:rypsrnstmjgyrdmadzjtmhieg4

Open Access

The CNN learns efficient features via a two-dimensional filtering operation, while the feature extraction performance of shallow classifiers is limited. ... The average F1scores for the hypernasality detection task are 0.9485 and 0.9746 using a dataset that is spoken by children and a dataset that is spoken by adults, respectively. ... Acknowledgements This work is supported by the National Natural Science Foundation of China 61503264. ...

doi:10.1007/s00034-019-01141-x fatcat:pqbhatdecjbrzdxcgqty7emznu

However, for tonal languagelike Thai, tone information is oneof the important features which can be used to improve the accuracy of recognition. ... Theb aseline system of an automatic speech recognition normally uses Mel-Frequency Cepstral Coefficients (MFCC)a s feature vectors. ... The results showed that the ANN approach with MFCC with tone features yielded a higher accuracy, i.e., lower word errorrate (WER), for speech recognition compared to the GMM approach. ...

dblp:conf/iics/SodanilNH10 fatcat:bvvec2pzdfcovairnekjpesaay

We verify the ACS algorithm on a Mandarin speech corpus AIShell-1, and it achieves a 31.2% CER in the online occasion, compared to the 32.4% CER of the attention-based model. ... Besides, a small change is made to the decoding stage of the encoder-decoder framework, which allows the prediction to exploit bidirectional contexts. ... The authors would like to thank Jiaxin Wen for her expertise in linguistics that helped to consolidate the idea proposed in this paper. ...

arXiv:1808.10088v2 fatcat:w3ueajpgbnbgvon67kpljpvklq

Multiple Versions

While building automatic speech recognition (ASR) requires a large amount of speech and text data, the problem gets worse for less-resourced languages. ... Adapting the Mandarin model improves the baseline Amharic model with a WER reduction of 10.25% (absolute). ... There are at least two alternative solutions: either adjusting the dimensionality of the adaptation data or reducing the dimensionality of the features by which the network trained on. ...

dblp:conf/sltu/Woldemariam20 fatcat:3e4de44a2bhfboxabpihq2qy4u

Combining the MODGDF and the spectral magnitude-based features gives a significant increase in recognition performance of 11% at best, while combining any two features derived from the spectral magnitude ... These features are then used for three speech processing tasks, namely, syllable, speaker, and language recognition. ... Syllable-based speech recognition In this section, we discuss the baseline system and experimental results for recognition of syllables on the DBIL Tamil and Telugu databases [27] . ...

doi:10.1155/2007/79032 fatcat:ooqz3hhrtbckhj4cl4yxnbsakq

DOAJ Szczepanski

A combination of three rhythm measures was necessary for separation of all five languages at once. ... a) Preliminary results obtained on part of the corpus were presented in "Rhythm measures with language-independent segmentation", Proceedings Abstract Patterns of durational variation were examined by ... For SA2a speech was represented as a 26-dimensional standard mel frequency cepstrum coefficient (MFCC) vector. For SA2b speech was represented as 41-dimensional Acoustic Description Vector. ...

doi:10.1121/1.3559709 pmid:21568427 fatcat:vvcwjocicfczleel3npvnrqnya

Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. ... This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS ... In the unified approach, acoustic models and features also link the two and adaptation of TTS is carried out implicitly during the adaptation of the ASR models without the need for a TTS front-end. ...

doi:10.1007/s12046-011-0050-4 fatcat:queqnrod5rdszcotam2kg7ie2m

results for a comprehensive analysis to understand the interaction between the doctor and the patient. ... The sentiment recognition system developed by the hospital is used as a comparison between the sentiment recognition results of the artificial neural network classification, and then use the foregoing ... Acknowledgments: We want to thank that the project of Ministry of Science and Technology, Taiwan (MOST 108-2634-F-038-002) provided part of data for this research. ...

doi:10.3390/app11114782 fatcat:pwct5mubwfblhb5sfrfwuozdwu

DOAJ

Tone Recognition Using Lifters and CTC

Preserved Fulltext

Combining prosodic and spectral features for Mandarin intonation recognition

Preserved Fulltext

A multi-view approach for Mandarin non-native mispronunciation verification [article]

Preserved Fulltext

Other Versions

Intelligent Call Manager Based On The Integration Of Computer Telephony, Internet And Speech Processing

Preserved Fulltext

A Comprehensive Review of the Speech Dependent Features and Classification Models used in Identification of Languages

Preserved Fulltext

Multi-distribution deep belief network for speech synthesis

Preserved Fulltext

Speaking nature: strategies for generating credible utterances of nature elements or phenomena

Preserved Fulltext

Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

Preserved Fulltext

Improving ASR for Continuous Thai Words Using ANN/HMM

Preserved Fulltext

End-to-end Speech Recognition with Adaptive Computation Steps [article]

Preserved Fulltext

Other Versions

Transfer Learning for Less-Resourced Semitic Languages Speech Recognition: the Case of Amharic

Preserved Fulltext

Significance of Joint Features Derived from the Modified Group Delay Function in Speech Processing

Preserved Fulltext

Rhythm measures and dimensions of durational variation in speech

Preserved Fulltext

Current trends in multilingual speech processing

Preserved Fulltext

Make Patient Consultation Warmer: A Clinical Application for Speech Emotion Recognition

Preserved Fulltext