Speaker-basis Accent Clustering Using Invariant Structure Analysis and the Speech Accent Archive.

the accent distance between any pair of the speakers by using their speech samples only. ... Creating the map, i.e., speaker-basis accent clustering, mathematically requires a distance matrix in terms of accents among all the speakers considered, and technically requires a method of predicting ... Use of speech structure to cluster simulated learners In [27] , we applied the pronunciation structure analysis to cluster simulated Japanese learners of English. ...

doi:10.21437/odyssey.2014-25 fatcat:6r4c3lljnfaoboudktqo62hl2q

In experiments, the Speech Accent Archive (SAA), which contains speech data of worldwide accented English, is used as training and testing samples. ... This paper investigates invariant pronunciation structure analysis and Support Vector Regression (SVR) to predict the inter-speaker pronunciation distances. ... The invariant structure analysis was proposed in [8, 9] inspired by Jakobson's structural phonology [10] and it can extract invariant and robust features. ...

doi:10.1109/asru.2013.6707733 dblp:conf/asru/ShenMMWPW13 fatcat:pl5l34ttrndblbhlbda5r2ibka

Accent clustering requires a technique to quantify the accent gap between any speaker pair and visualization requires a technique of stress-free plotting of the speakers. ... We have developed two techniques of individual-based clustering of the diversity [1, 2] and educationallyeffective visualization of the diversity [3]. ... Training of SVR is done by using all the archive speakers and testing is done by predicting the accent gap between that new speaker and each of the archive speakers. ...

doi:10.1109/icsda.2015.7357855 dblp:conf/ococosda/SatoKMSH15 fatcat:pt4ritnbhngalbcklw6sldgv2q

Citation

Yuichi Sato, Yosuke Kashiwagi, Nobuaki Minematsu, Daisuke Saito, Keikichi Hirose. "Noise-robust and stress-free visualization of pronunciation diversity of World Englishes using a learner's self-centered viewpoint." 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) (2015) 1-6

Saito, and K. Hirose, "Speaker-basis accent clustering using invariant structure analysis and the speech accent archive," Proc. ... This research trial was conducted by using the Speech Accent Archive (SA A) [2],where readings of a common paragraph were collected from more than t8K international speakers including many non-native speakers ...

doi:10.24467/onseikenkyu.18.3_62_2 fatcat:npcn6bwvubbwflcj2adh7if5cy

lang:ja

Citation

Tianze Shi, Shun Kasahara, Nobuaki Minematsu, Daisuke Saito, Keikichi Hirose. "P7. Experimental investigation of the definition of reference accent distance between speakers toward automatic accent clustering of speakers of World Englishes (Summaries of Presentations at the 28th General Meeting)." Journal of the Phonetic Society of Japan 18.3 (2014) 62-63

For automatic clustering of speakers in terms of accents, it is necessary to measure the accent distance between an arbitrary pair of speakers. For that, in [1], we trained a machine so that ... 日本音声学会 2014 年度 (第 28 回) 全国大会発表要旨 only speech samples but also their IPA narrow transcripts. ... Saito, and K. Hirose, "Speaker-basis accent clustering using invariant structure analysis and the speech accent archive," Proc. ...

doi:10.24467/onseikenkyu.18.3_63_2 fatcat:ku3sejtqmrchzmb6zbbhr7j6ki

lang:ja

We audit some of the most popular English language ASR services using a large and global data set of speech from The Speech Accent Archive, which includes over 2,700 speakers of English born in 171 different ... Past research has identified discriminatory automatic speech recognition (ASR) performance as a function of the racial group and nationality of the speaker. ... ACKNOWLEDGMENTS The authors thank Abigail Lewis for her helpful insights on quantitative analysis, the stargazer R package [19] , and The Speech Accent Archive [48] . ...

arXiv:2208.01157v2 fatcat:zl2fbtb6ircvfaovpoojtnobou

Open Access Multiple Versions

We use these representations to compute word-based pronunciation differences between non-native and native speakers of English, and between Norwegian dialect speakers. ... Transformers) lead to a better match with human perception than two earlier approaches on the basis of phonetic transcriptions and MFCC-based acoustic features. ... Acknowledgments The authors thank Hedwig Sekeres for creating the transcriptions of the Dutch speakers dataset, and Anna Pot for creating the visualization of the acoustic distance measure. ...

arXiv:2011.12649v3 fatcat:mifjjs23tbgmfc2bf67hr7mzhu

Open Access Multiple Versions

Trask (2003: 106) writes that, nowadays, "for most English and Welsh speakers, the in hair and head is just as dead as those in light and loud". e loss of /h/ (H-dropping, or aich dropping) remains stigmatized ... and contrasts with a tendency to hypercorrection-i.e. the insertion of an illicit [h]. is article is a synthesis of the literature on English /h/, with special attention to its diachronic and synchronic ... She stresses that these phonemes have all undergone positional and structural weakening in the history of English. ...

doi:10.4000/ranam.728 fatcat:kepzy6peqnerrfwgevqlxo3peu

Sample analyses of archival X-ray ut- terance data from a speaker of American English are presented. ... It is shown that this explicit gestural model of phonetic structure can be used to investigate the contextual variation of phonetic units such as schwa in natural speech. ...

We describe our work on developing a speech recognition system for multi-genre media archives. ... The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combintation of in-domain and out-of-domain data. ... Each show was first segmented and clustered by speaker using the CU RT-04 diarisation system [20] . ...

doi:10.1109/slt.2012.6424244 dblp:conf/slt/BellGLLLRSW12 fatcat:aim2jg6trfeitnt6l2amqkbje4

The work that forms the basis for this plenary lecture is published as Szmrecsanyi et al. (2019). ... Benedikt Szmrecsanyi's plenary lecture, "Variation squared", aimed to bridge the gap between the intra-speaker approach to variation from comparative sociolinguistics and the inter-speaker focus from quantitative ... Acknowledgements We gratefully acknowledge the use of the Speech Accent Archive under the Creative Commons License. ...

doi:10.1075/silv.25.05ful fatcat:ymnwueq5zjdrxop2ltrkuty53q

Once the feature set is defined, it is used for unsupervised clustering of an audiobook, where from each cluster a voice is trained. ... Results show that a combination of traditional and i-vector based features performs better in unsupervised clustering of expressive speech than traditional features and even better than large state-of-the-art ... Acknowledgements First of all I would like to thank Antonio Bonafonte for his help, lead and patience, and for the opportunity to work and to develop this work in his group. ...

doi:10.21437/iberspeech.2018-38 dblp:conf/iberspeech/Jauk18 fatcat:6zogjdy3gjgslfbbgrqirjzsx4

Dependencies between sound structure on the one hand and word, phrase, clause, sentence, and discourse structure, or also lexical structure, on the other were something 4 ... The variables that are allegedly interrelated pertain to segment inventories, the shapes of syllables, morphemes, and words, phonological or morphonological rules, tones and accents, and rhythmic or prosodic ... distinctions lost in consonant shifts), and even the structure of verse and music (with poets and singers drawing on what they know and do äs Speakers). ...

doi:10.1515/lity.1998.2.2.195 fatcat:2qifcdnqfzcydckeae3cjsyeyi

The variables that are allegedly interrelated pertain to segment inventories, the shapes of syllables, morphemes, and words, phonological or morphonological rules, tones and accents, and rhythmic or prosodic ... patterns on the one hand and to analytic or (poly-)synthetic grammar, Separatist or cumulative morphological exponence, the complexity of grammatical units, and their linear order on the other. ... distinctions lost in consonant shifts), and even the structure of verse and music (with poets and singers drawing on what they know and do äs Speakers). ...

doi:10.1515/lingty-2017-1007 fatcat:nhq4zcwtzvgk3pwvdsbntbhlha

This paper presents a comprehensive survey on the speech recognition techniques for non-Indian and Indian languages, and compiled some of the computational models used for processing speech acoustics. ... Combination of MFCC and DNN–HMM classifier is most commonly used system for developing ASR minority languages, whereas in some of the majority languages, researchers are using much advance algorithms of ... archives. ...

doi:10.1007/s40747-022-00665-1 fatcat:6pu2xccbq5as7bn2y2tav2fdwa

Open Access

Speaker-basis Accent Clustering Using Invariant Structure Analysis and the Speech Accent Archive

Preserved Fulltext

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

Preserved Fulltext

Noise-robust and stress-free visualization of pronunciation diversity of World Englishes using a learner's self-centered viewpoint

Preserved Fulltext

P7. Experimental investigation of the definition of reference accent distance between speakers toward automatic accent clustering of speakers of World Englishes (Summaries of Presentations at the 28th General Meeting)

Preserved Fulltext

ワークショップ「有声促音の音声学的諸問題:地域変異と発話スタイルを中心に」(日本音声学会2014年度(第28回)全国大会発表要旨)

Preserved Fulltext

Global Performance Disparities Between English-Language Accents in Automatic Speech Recognition [article]

Preserved Fulltext

Other Versions

Neural Representations for Modeling Variation in Speech [article]

Preserved Fulltext

Other Versions

Diachronic and Synchronic Variability of the English Phoneme /h/

Preserved Fulltext

Page 1963 of Linguistics and Language Behavior Abstracts: LLBA Vol. 26, Issue 4 [page]

Preserved Fulltext

Transcription of multi-genre media archives using out-of-domain data

Preserved Fulltext

Chapter 5. "Organically German"? [chapter]

Preserved Fulltext

Unsupervised Learning for Expressive Speech Synthesis

Preserved Fulltext

The co-variation of phonology with morphology and syntax: A hopeful history

Preserved Fulltext

The co-variation of phonology with morphology and syntax: A hopeful history

Preserved Fulltext

Computational intelligence in processing of speech acoustics: a survey

Preserved Fulltext