A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Tone Recognition Using Lifters and CTC
2018
Interspeech 2018
The performance of the proposed method is evaluated on a freely available Mandarin Chinese speech corpus, AISHELL-1, and is shown to outperform the existing techniques in the literature in terms of tone ...
The method works by converting the speech signal to a cepstrogram, extracting a sequence of cepstral features using a convolutional neural network, and predicting the underlying sequence of tones using ...
A simpler approach is to use an end-to-end method with a sequence-level training criterion. In the next section, we describe a new method for tone recognition that makes use of these two solutions. ...
doi:10.21437/interspeech.2018-2293
dblp:conf/interspeech/LugoschT18
fatcat:qiynaigaynhotam67sfrm7ylly
Combining prosodic and spectral features for Mandarin intonation recognition
2014
The 9th International Symposium on Chinese Spoken Language Processing
In this paper, a feature set for Chinese Mandarin intonation is addressed. ...
Our study is performed on a Mandarin question corpus, which contains large amount and various types of interrogative sentences. ...
As a result, MFCC has been widely used for speech recognition. As in typical speech recognition we extracted a 13 dimensional MFCC feature vector, which consists of 12 MFCC and a normalized energy. ...
doi:10.1109/iscslp.2014.6936692
dblp:conf/iscslp/BaoLGTCL14
fatcat:ueuva6ys7jazxcb4kzvgubh2fa
A multi-view approach for Mandarin non-native mispronunciation verification
[article]
2020
arXiv
pre-print
In this study, a multi-view approach is proposed to incorporate discriminative feature representations which requires less annotation for non-native mispronunciation verification of Mandarin. ...
The approach shows improvement over GOP-based approach by +11.23% and single-view approach by +1.47% in diagnostic accuracy for a mispronunciation verification task. ...
We used 13-dimensional Mel Frequency Cepstrum Coefficient (MFCC) as acoustic features. The TDNN network consists of six hidden layers, each of which contains 850 units. ...
arXiv:2009.02573v2
fatcat:tqw5j5thwvcqnbusstyvqbybtu
Intelligent Call Manager Based On The Integration Of Computer Telephony, Internet And Speech Processing
1998
International Conference on Consumer Electronics
Conventionally, there are 408 Mandarin base syllables, regardless of tones, which is composed of 21 INITIAL's and 38 FINAL's [9] . In this paper, a two-stage keyword recognition system is proposed. ...
Subsyllable boundaries are then used to extract the FINAL parts of Mandarin syllables, which contain the prosodic information. ...
Acknowledgment The authors would like to thank the National Science Council, the Republic of China, for financial support of this work under contract No. NSC86-2622-E-006-003. ...
doi:10.1109/icce.1998.678264
fatcat:ta5pvcuq4jdgvibqeogw6nhqte
A Comprehensive Review of the Speech Dependent Features and Classification Models used in Identification of Languages
2016
International Journal of Computer Applications
In this paper a comprehensive review of the approaches used in identifying spoken languages and the methods used for extracting speech dependent features are presented. ...
information services, such as checking into a hotel, arranging a meeting, or making travel arrangements, which are difficult actions for non native speakers. ...
Yan Deng, Jia Liu [3] used two approaches ,i.e support Vector Machine(SVM) and Phonetic N-gram, experimenting with two different ways of using SVM in the token based system, parallel phoneme Recognition ...
doi:10.5120/ijca2016911052
fatcat:y64thevii5de3etykx7zi36qau
Multi-distribution deep belief network for speech synthesis
2013
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Previous work on DBN in the speech community mainly focuses on using the generatively pre-trained DBN to initialize a discriminative model for better acoustic modeling in speech recognition (SR). ...
Subjective results also confirm the advantage of the spectrum from DBN, and the overall quality is comparable to that of context-independent HMM. ...
Each syllable HMM has a left-to-right topology with 10 states. Initially, 416 mono-syllable HMMs are estimated as the seed for 1,364 tonal syllable HMMs. ...
doi:10.1109/icassp.2013.6639225
dblp:conf/icassp/KangQM13
fatcat:ddzugdgcnjd2fppkszjb3dc674
Speaking nature: strategies for generating credible utterances of nature elements or phenomena
2012
Zenodo
These sound scenarios can be a recreation of a real sound space or can be the representation of the idea of how a sound space or an element in it could sound. ...
Sound designers, DSP Engineers, musicians, etc., design and make use of several tools that allow them to create a given sound scenario not inly for film, but also for TV, Video-Games, Art Installations ...
Interdisciplinary diagram for the wah-wah effect. [19]
Figure No. 4 Activity-Valence two-dimensional space [24]
Figure 5 . 5 Figure 5. Two dimensional emotion plane. ...
doi:10.5281/zenodo.3702732
fatcat:rypsrnstmjgyrdmadzjtmhieg4
Automatic Hypernasality Detection in Cleft Palate Speech Using CNN
2019
Circuits, systems, and signal processing
The CNN learns efficient features via a two-dimensional filtering operation, while the feature extraction performance of shallow classifiers is limited. ...
The average F1scores for the hypernasality detection task are 0.9485 and 0.9746 using a dataset that is spoken by children and a dataset that is spoken by adults, respectively. ...
Acknowledgements This work is supported by the National Natural Science Foundation of China 61503264. ...
doi:10.1007/s00034-019-01141-x
fatcat:pqbhatdecjbrzdxcgqty7emznu
Improving ASR for Continuous Thai Words Using ANN/HMM
2010
International Conference on Innovations for Community Services
However, for tonal languagelike Thai, tone information is oneof the important features which can be used to improve the accuracy of recognition. ...
Theb aseline system of an automatic speech recognition normally uses Mel-Frequency Cepstral Coefficients (MFCC)a s feature vectors. ...
The results showed that the ANN approach with MFCC with tone features yielded a higher accuracy, i.e., lower word errorrate (WER), for speech recognition compared to the GMM approach. ...
dblp:conf/iics/SodanilNH10
fatcat:bvvec2pzdfcovairnekjpesaay
End-to-end Speech Recognition with Adaptive Computation Steps
[article]
2018
arXiv
pre-print
We verify the ACS algorithm on a Mandarin speech corpus AIShell-1, and it achieves a 31.2% CER in the online occasion, compared to the 32.4% CER of the attention-based model. ...
Besides, a small change is made to the decoding stage of the encoder-decoder framework, which allows the prediction to exploit bidirectional contexts. ...
The authors would like to thank Jiaxin Wen for her expertise in linguistics that helped to consolidate the idea proposed in this paper. ...
arXiv:1808.10088v2
fatcat:w3ueajpgbnbgvon67kpljpvklq
Transfer Learning for Less-Resourced Semitic Languages Speech Recognition: the Case of Amharic
2020
Workshop on Spoken Language Technologies for Under-resourced Languages
While building automatic speech recognition (ASR) requires a large amount of speech and text data, the problem gets worse for less-resourced languages. ...
Adapting the Mandarin model improves the baseline Amharic model with a WER reduction of 10.25% (absolute). ...
There are at least two alternative solutions: either adjusting the dimensionality of the adaptation data or reducing the dimensionality of the features by which the network trained on. ...
dblp:conf/sltu/Woldemariam20
fatcat:3e4de44a2bhfboxabpihq2qy4u
Significance of Joint Features Derived from the Modified Group Delay Function in Speech Processing
2007
EURASIP Journal on Audio, Speech, and Music Processing
Combining the MODGDF and the spectral magnitude-based features gives a significant increase in recognition performance of 11% at best, while combining any two features derived from the spectral magnitude ...
These features are then used for three speech processing tasks, namely, syllable, speaker, and language recognition. ...
Syllable-based speech recognition In this section, we discuss the baseline system and experimental results for recognition of syllables on the DBIL Tamil and Telugu databases [27] . ...
doi:10.1155/2007/79032
fatcat:ooqz3hhrtbckhj4cl4yxnbsakq
Rhythm measures and dimensions of durational variation in speech
2011
Journal of the Acoustical Society of America
A combination of three rhythm measures was necessary for separation of all five languages at once. ...
a) Preliminary results obtained on part of the corpus were presented in "Rhythm measures with language-independent segmentation", Proceedings Abstract Patterns of durational variation were examined by ...
For SA2a speech was represented as a 26-dimensional standard mel frequency cepstrum coefficient (MFCC) vector. For SA2b speech was represented as 41-dimensional Acoustic Description Vector. ...
doi:10.1121/1.3559709
pmid:21568427
fatcat:vvcwjocicfczleel3npvnrqnya
Current trends in multilingual speech processing
2011
Sadhana (Bangalore)
Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. ...
This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS ...
In the unified approach, acoustic models and features also link the two and adaptation of TTS is carried out implicitly during the adaptation of the ASR models without the need for a TTS front-end. ...
doi:10.1007/s12046-011-0050-4
fatcat:queqnrod5rdszcotam2kg7ie2m
Make Patient Consultation Warmer: A Clinical Application for Speech Emotion Recognition
2021
Applied Sciences
results for a comprehensive analysis to understand the interaction between the doctor and the patient. ...
The sentiment recognition system developed by the hospital is used as a comparison between the sentiment recognition results of the artificial neural network classification, and then use the foregoing ...
Acknowledgments: We want to thank that the project of Ministry of Science and Technology, Taiwan (MOST 108-2634-F-038-002) provided part of data for this research. ...
doi:10.3390/app11114782
fatcat:pwct5mubwfblhb5sfrfwuozdwu
« Previous
Showing results 1 — 15 out of 91 results