Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








91 Hits in 3.5 sec

Tone Recognition Using Lifters and CTC

Loren Lugosch, Vikrant Singh Tomar
2018 Interspeech 2018  
The performance of the proposed method is evaluated on a freely available Mandarin Chinese speech corpus, AISHELL-1, and is shown to outperform the existing techniques in the literature in terms of tone  ...  The method works by converting the speech signal to a cepstrogram, extracting a sequence of cepstral features using a convolutional neural network, and predicting the underlying sequence of tones using  ...  A simpler approach is to use an end-to-end method with a sequence-level training criterion. In the next section, we describe a new method for tone recognition that makes use of these two solutions.  ... 
doi:10.21437/interspeech.2018-2293 dblp:conf/interspeech/LugoschT18 fatcat:qiynaigaynhotam67sfrm7ylly

Combining prosodic and spectral features for Mandarin intonation recognition

Wei Bao, Ya Li, Mingliang Gu, Jianhua Tao, Linlin Chao, Shanfeng Liu
2014 The 9th International Symposium on Chinese Spoken Language Processing  
In this paper, a feature set for Chinese Mandarin intonation is addressed.  ...  Our study is performed on a Mandarin question corpus, which contains large amount and various types of interrogative sentences.  ...  As a result, MFCC has been widely used for speech recognition. As in typical speech recognition we extracted a 13 dimensional MFCC feature vector, which consists of 12 MFCC and a normalized energy.  ... 
doi:10.1109/iscslp.2014.6936692 dblp:conf/iscslp/BaoLGTCL14 fatcat:ueuva6ys7jazxcb4kzvgubh2fa

A multi-view approach for Mandarin non-native mispronunciation verification [article]

Zhenyu Wang, John H.L. Hansen, Yanlu Xie
2020 arXiv   pre-print
In this study, a multi-view approach is proposed to incorporate discriminative feature representations which requires less annotation for non-native mispronunciation verification of Mandarin.  ...  The approach shows improvement over GOP-based approach by +11.23% and single-view approach by +1.47% in diagnostic accuracy for a mispronunciation verification task.  ...  We used 13-dimensional Mel Frequency Cepstrum Coefficient (MFCC) as acoustic features. The TDNN network consists of six hidden layers, each of which contains 850 units.  ... 
arXiv:2009.02573v2 fatcat:tqw5j5thwvcqnbusstyvqbybtu

Intelligent Call Manager Based On The Integration Of Computer Telephony, Internet And Speech Processing

Chung-Hsien Wu, Yeou-Jiunn Chen, Gwo-Lang Yan
1998 International Conference on Consumer Electronics  
Conventionally, there are 408 Mandarin base syllables, regardless of tones, which is composed of 21 INITIAL's and 38 FINAL's [9] . In this paper, a two-stage keyword recognition system is proposed.  ...  Subsyllable boundaries are then used to extract the FINAL parts of Mandarin syllables, which contain the prosodic information.  ...  Acknowledgment The authors would like to thank the National Science Council, the Republic of China, for financial support of this work under contract No. NSC86-2622-E-006-003.  ... 
doi:10.1109/icce.1998.678264 fatcat:ta5pvcuq4jdgvibqeogw6nhqte

A Comprehensive Review of the Speech Dependent Features and Classification Models used in Identification of Languages

Chandrakanta Mohapatra, Sujata Dash, Umakanta Majhi
2016 International Journal of Computer Applications  
In this paper a comprehensive review of the approaches used in identifying spoken languages and the methods used for extracting speech dependent features are presented.  ...  information services, such as checking into a hotel, arranging a meeting, or making travel arrangements, which are difficult actions for non native speakers.  ...  Yan Deng, Jia Liu [3] used two approaches ,i.e support Vector Machine(SVM) and Phonetic N-gram, experimenting with two different ways of using SVM in the token based system, parallel phoneme Recognition  ... 
doi:10.5120/ijca2016911052 fatcat:y64thevii5de3etykx7zi36qau

Multi-distribution deep belief network for speech synthesis

Shiyin Kang, Xiaojun Qian, Helen Meng
2013 2013 IEEE International Conference on Acoustics, Speech and Signal Processing  
Previous work on DBN in the speech community mainly focuses on using the generatively pre-trained DBN to initialize a discriminative model for better acoustic modeling in speech recognition (SR).  ...  Subjective results also confirm the advantage of the spectrum from DBN, and the overall quality is comparable to that of context-independent HMM.  ...  Each syllable HMM has a left-to-right topology with 10 states. Initially, 416 mono-syllable HMMs are estimated as the seed for 1,364 tonal syllable HMMs.  ... 
doi:10.1109/icassp.2013.6639225 dblp:conf/icassp/KangQM13 fatcat:ddzugdgcnjd2fppkszjb3dc674

Speaking nature: strategies for generating credible utterances of nature elements or phenomena

Andrea Lorena Aldana Blanco
2012 Zenodo  
These sound scenarios can be a recreation of a real sound space or can be the representation of the idea of how a sound space or an element in it could sound.  ...  Sound designers, DSP Engineers, musicians, etc., design and make use of several tools that allow them to create a given sound scenario not inly for film, but also for TV, Video-Games, Art Installations  ...  Interdisciplinary diagram for the wah-wah effect. [19] Figure No. 4 Activity-Valence two-dimensional space [24] Figure 5 . 5 Figure 5. Two dimensional emotion plane.  ... 
doi:10.5281/zenodo.3702732 fatcat:rypsrnstmjgyrdmadzjtmhieg4

Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

Xiyue Wang, Ming Tang, Sen Yang, Heng Yin, Hua Huang, Ling He
2019 Circuits, systems, and signal processing  
The CNN learns efficient features via a two-dimensional filtering operation, while the feature extraction performance of shallow classifiers is limited.  ...  The average F1scores for the hypernasality detection task are 0.9485 and 0.9746 using a dataset that is spoken by children and a dataset that is spoken by adults, respectively.  ...  Acknowledgements This work is supported by the National Natural Science Foundation of China 61503264.  ... 
doi:10.1007/s00034-019-01141-x fatcat:pqbhatdecjbrzdxcgqty7emznu

Improving ASR for Continuous Thai Words Using ANN/HMM

Maleerat Sodanil, Supot Nitsuwat, Choochart Haruechaiyasak
2010 International Conference on Innovations for Community Services  
However, for tonal languagelike Thai, tone information is oneof the important features which can be used to improve the accuracy of recognition.  ...  Theb aseline system of an automatic speech recognition normally uses Mel-Frequency Cepstral Coefficients (MFCC)a s feature vectors.  ...  The results showed that the ANN approach with MFCC with tone features yielded a higher accuracy, i.e., lower word errorrate (WER), for speech recognition compared to the GMM approach.  ... 
dblp:conf/iics/SodanilNH10 fatcat:bvvec2pzdfcovairnekjpesaay

End-to-end Speech Recognition with Adaptive Computation Steps [article]

Mohan Li, Min Liu, Masanori Hattori
2018 arXiv   pre-print
We verify the ACS algorithm on a Mandarin speech corpus AIShell-1, and it achieves a 31.2% CER in the online occasion, compared to the 32.4% CER of the attention-based model.  ...  Besides, a small change is made to the decoding stage of the encoder-decoder framework, which allows the prediction to exploit bidirectional contexts.  ...  The authors would like to thank Jiaxin Wen for her expertise in linguistics that helped to consolidate the idea proposed in this paper.  ... 
arXiv:1808.10088v2 fatcat:w3ueajpgbnbgvon67kpljpvklq

Transfer Learning for Less-Resourced Semitic Languages Speech Recognition: the Case of Amharic

Yonas Woldemariam
2020 Workshop on Spoken Language Technologies for Under-resourced Languages  
While building automatic speech recognition (ASR) requires a large amount of speech and text data, the problem gets worse for less-resourced languages.  ...  Adapting the Mandarin model improves the baseline Amharic model with a WER reduction of 10.25% (absolute).  ...  There are at least two alternative solutions: either adjusting the dimensionality of the adaptation data or reducing the dimensionality of the features by which the network trained on.  ... 
dblp:conf/sltu/Woldemariam20 fatcat:3e4de44a2bhfboxabpihq2qy4u

Significance of Joint Features Derived from the Modified Group Delay Function in Speech Processing

Rajesh M. Hegde, Hema A. Murthy, V. R. R. Gadde
2007 EURASIP Journal on Audio, Speech, and Music Processing  
Combining the MODGDF and the spectral magnitude-based features gives a significant increase in recognition performance of 11% at best, while combining any two features derived from the spectral magnitude  ...  These features are then used for three speech processing tasks, namely, syllable, speaker, and language recognition.  ...  Syllable-based speech recognition In this section, we discuss the baseline system and experimental results for recognition of syllables on the DBIL Tamil and Telugu databases [27] .  ... 
doi:10.1155/2007/79032 fatcat:ooqz3hhrtbckhj4cl4yxnbsakq

Rhythm measures and dimensions of durational variation in speech

Anastassia Loukina, Greg Kochanski, Burton Rosner, Elinor Keane, Chilin Shih
2011 Journal of the Acoustical Society of America  
A combination of three rhythm measures was necessary for separation of all five languages at once.  ...  a) Preliminary results obtained on part of the corpus were presented in "Rhythm measures with language-independent segmentation", Proceedings Abstract Patterns of durational variation were examined by  ...  For SA2a speech was represented as a 26-dimensional standard mel frequency cepstrum coefficient (MFCC) vector. For SA2b speech was represented as 41-dimensional Acoustic Description Vector.  ... 
doi:10.1121/1.3559709 pmid:21568427 fatcat:vvcwjocicfczleel3npvnrqnya

Current trends in multilingual speech processing

HERVÉ BOURLARD, JOHN DINES, MATHEW MAGIMAI-DOSS, PHILIP N GARNER, DAVID IMSENG, PETR MOTLICEK, HUI LIANG, LAKSHMI SAHEER, FABIO VALENTE
2011 Sadhana (Bangalore)  
Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces.  ...  This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS  ...  In the unified approach, acoustic models and features also link the two and adaptation of TTS is carried out implicitly during the adaptation of the ASR models without the need for a TTS front-end.  ... 
doi:10.1007/s12046-011-0050-4 fatcat:queqnrod5rdszcotam2kg7ie2m

Make Patient Consultation Warmer: A Clinical Application for Speech Emotion Recognition

Huan-Chung Li, Telung Pan, Man-Hua Lee, Hung-Wen Chiu
2021 Applied Sciences  
results for a comprehensive analysis to understand the interaction between the doctor and the patient.  ...  The sentiment recognition system developed by the hospital is used as a comparison between the sentiment recognition results of the artificial neural network classification, and then use the foregoing  ...  Acknowledgments: We want to thank that the project of Ministry of Science and Technology, Taiwan (MOST 108-2634-F-038-002) provided part of data for this research.  ... 
doi:10.3390/app11114782 fatcat:pwct5mubwfblhb5sfrfwuozdwu
« Previous Showing results 1 — 15 out of 91 results