Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








159 Hits in 5.6 sec

Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face

Yelin Kim, Emily Mower Provost
2015 ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)  
We found that it is critical to employ proper temporal segmentation and to leverage knowledge of spoken content to improve classification performance.  ...  For example, an individual's mouth movement may be similar when he smiles and when he pronounces the phoneme /IY/, as in "cheese".  ...  The method automatically segments time-series data based on rapid changes, and clusters the segments using density-based clustering.  ... 
doi:10.1145/2808204 fatcat:k6wk4d67czfxxk5hvjobdvqs4y

Speaker segmentation and clustering

Margarita Kotti, Vassiliki Moschou, Constantine Kotropoulos
2008 Signal Processing  
Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics.  ...  Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined.  ...  This work has been supported by the "PYTHAGORAS II" Programme, funded in part by the European Union (75%) and in part by the Hellenic Ministry of Education and Religious Affairs (25%). M.  ... 
doi:10.1016/j.sigpro.2007.11.017 fatcat:xjh52tmotfa45j6d5b25aqzgjq

Robust Target Speaker Tracking in Broadcast TV Streams

Junmei Bai, Hongchen Jiang, Shilei Zhang, Shuwu Zhang, Bo Xu
2006 International Journal of Computational Linguistics and Chinese Language Processing  
This paper addresses the problem of audio change detection and speaker tracking in broadcast TV streams.  ...  A two-pass audio change detection algorithm, which includes detection of the potential change boundaries and refinement, is proposed.  ...  Acknowledgement Support provided by the National Natural Science Foundation of China (NSFC) under grant no. 60475014 and the National Hi-tech Research Plan under grant no. 2005AA114130 is gratefully acknowledged  ... 
dblp:journals/ijclclp/BaiJZZX06 fatcat:cl6fvfilrzhutgrfqkzpr3ihjy

Joint Segmentation and Classification of Time Series Using Class-Specific Features

Z.J. Wang, P. Willett
2004 IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)  
We present an approach for the joint segmentation and classification of a time series.  ...  There is similarly no need for an a-priori specification of the number of sections, as the approach uses an appropriate penalization of an over-zealous segmentation. The scheme has two stages.  ...  on CS for penalization of model complexity and an extra MDL-like term for the number of segments.  ... 
doi:10.1109/tsmcb.2003.819486 pmid:15376851 fatcat:jnbfx66gt5ck3egbwezazpgzlu

Agglomerative information bottleneck for speaker diarization of meetings data

Deepu Vijayasenan, Fabio Valente, Herve Bourlard
2007 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)  
In contrary to the state-of-the-art diarization systems that models individual speakers with Gaussian Mixture Models, the proposed algorithm is completely non parametric .  ...  Both clustering and model selection issues of nonparametric models are addressed in this work. The proposed algorithm is evaluated on meeting data on the RT06 evaluation data set.  ...  Xavier Anguera for their help with baseline system and beam-forming toolkit. Authors also would like to thank Dr. John Dines for his help with the speech/non-speech segmentation  ... 
doi:10.1109/asru.2007.4430119 dblp:conf/asru/VijayasenanVB07 fatcat:tparlf5o55bevluhltinxc6kau

Audio segmentation-by-classification approach based on factor analysis in broadcast news domain

Diego Castán, Alfonso Ortega, Antonio Miguel, Eduardo Lleida
2014 EURASIP Journal on Audio, Speech, and Music Processing  
This paper studies a novel audio segmentation-by-classification approach based on factor analysis.  ...  The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others.  ...  Acknowledgements This work has been funded by the Spanish Government and the European Union (FEDER) under the project TIN2011-28169-C05-02.  ... 
doi:10.1186/s13636-014-0034-5 fatcat:zxkjge4xxnhufoaquhjfmd2gpi

Audio segmentation-by-classification approach based on factor analysis in broadcast news domain

Diego Castán, Alfonso Ortega, Antonio Miguel, Eduardo Lleida
2014 EURASIP Journal on Audio, Speech, and Music Processing  
This paper studies a novel audio segmentation-by-classification approach based on factor analysis.  ...  The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others.  ...  Acknowledgements This work has been funded by the Spanish Government and the European Union (FEDER) under the project TIN2011-28169-C05-02.  ... 
doi:10.1186/preaccept-1330210582123399 fatcat:i7orhxn4rjcv7iyee2r4yfuyou

Voice Activity Detection Using Generalized Gamma Distribution [chapter]

George Almpanidis, Constantine Kotropoulos
2006 Lecture Notes in Computer Science  
In this work, we model speech samples with a two-sided generalized Gamma distribution and evaluate its efficiency for voice activity detection.  ...  Using a computationally inexpensive maximum likelihood approach, we employ the Bayesian Information Criterion for identifying the phoneme boundaries in noisy speech.  ...  In many cases voice activity detection (VAD), endpoint detection, speaker segmentation, and audio classification can be seen as similar problems and they share a common methodology.  ... 
doi:10.1007/11752912_3 fatcat:c2uah6z5ezcivitktpgfrhbf2e

An Information Theoretic Approach to Speaker Diarization of Meeting Data

D. Vijayasenan, F. Valente, H. Bourlard
2009 IEEE Transactions on Audio, Speech, and Language Processing  
A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the Information Bottleneck (IB) principle.  ...  We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved  ...  Wooters and Dr. X. Anguera for their help with baseline system and beam-forming toolkit. They would also like to thank Dr. J. Dines for his help with the speech/non-speech segmentation and Dr. P.  ... 
doi:10.1109/tasl.2009.2015698 fatcat:vjjbh27fwnep3aoyikob5h5p7q

Data Discovery Using Lossless Compression-Based Sparse Representation [article]

Elyas Sabeti, Peter X.K. Song, Alfred O. Hero III
2021 arXiv   pre-print
Sparse representation has been widely used in data compression, signal and image denoising, dimensionality reduction and computer vision.  ...  In this paper, we propose a data-driven sparse representation using orthonormal bases under the lossless compression constraint.  ...  MDL has also been previously used for sparse lossless audio compression [17] and dictionary learning [18] as well. In the next section, we summarize the MDL results that are used in this paper.  ... 
arXiv:2103.08765v2 fatcat:3rbwrsczlze3hhia4ie5blp4ue

A comprehensive study of visual event computing

WeiQi Yan, Declan F. Kieran, Setareh Rafatirad, Ramesh Jain
2010 Multimedia tools and applications  
Later, we review an extensive set of papers taken from well-known conferences and journals in multiple disciplines. We analyze events, and summarize the procedure of visual event actions.  ...  We start by presenting events and their classifications, and continue with discussing the problem of capturing events in terms of photographs, videos, etc, as well as the methodologies for event storing  ...  Paul Miller, and Dr. Xiwu Gu etc. This work was partially supported by QUB research project: Unusual event detection in audio-visual surveillance for public transport (NO.D6223EEC).  ... 
doi:10.1007/s11042-010-0560-9 fatcat:ak6u3eefefgjhmbpr7asru3n7u

On Interpretable Approaches to Cluster, Classify and Represent Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion Theory [article]

Kai-Liang Lu, Avraham Chapman
2023 arXiv   pre-print
., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion  ...  These are derived based on the lossy data coding and compression framework from the principle of rate distortion in information theory.  ...  And we would also like to thank the anonymous reviewers for their comments and suggestions.  ... 
arXiv:2302.10383v1 fatcat:unkuzwi42zalpgf73b7qp3v54i

Using audio and video features to classify the most dominant person in a group meeting

Hayley Hung, Dinesh Jayagopi, Chuohao Yeo, Gerald Friedland, Sileye Ba, Jean-Marc Odobez, Kannan Ramchandran, Nikki Mirghafori, Daniel Gatica-Perez
2007 Proceedings of the 15th international conference on Multimedia - MULTIMEDIA '07  
In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues.  ...  We show that by using a simple model for dominance estimation we can obtain promising results.  ...  AMIDA-30), the Swiss NCCR IM2, and the German Academic Exchange Service (DAAD). We thank Bastien Crettol (IDIAP) for his support with data annotation.  ... 
doi:10.1145/1291233.1291423 dblp:conf/mm/HungJYFBORMG07 fatcat:5xeoo2f6l5hqppghts4pf6crsm

Exploring probabilistic localized video representation for human action recognition

Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin
2011 Multimedia tools and applications  
Specifically, the proposed representation encodes the visual and motion information of an ensemble of local ST features of a video into a distribution estimated by a generative probabilistic model.  ...  The codebook needs to be re-built when video corpus is changed. To tackle these issues, this paper explores a localized, continuous and probabilistic video representation.  ...  (2) the number of mixture components in GMM plays an important role and our proposed MDL criterion based model selection gives good performance.  ... 
doi:10.1007/s11042-011-0748-7 fatcat:wn3h7glsb5eercbp27qwro4nqe

Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion

George Almpanidis, Constantine Kotropoulos
2008 Speech Communication  
In this work, we present a text-independent automatic phone segmentation algorithm based on the Bayesian Information Criterion.  ...  In order to alleviate this problem and detect the phone boundaries accurately, we employ an information criterion corrected for small samples while modelling speech samples with the generalised Gamma distribution  ...  For an accurate segmentation, the frame shift should be as small as possible. Such an acoustic change detection system based on BIC has been proposed by Chen and Gopalakrishnam (1998) .  ... 
doi:10.1016/j.specom.2007.06.005 fatcat:i4dxkivpwzgybiuvrxsoymsemu
« Previous Showing results 1 — 15 out of 159 results