A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face
2015
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
We found that it is critical to employ proper temporal segmentation and to leverage knowledge of spoken content to improve classification performance. ...
For example, an individual's mouth movement may be similar when he smiles and when he pronounces the phoneme /IY/, as in "cheese". ...
The method automatically segments time-series data based on rapid changes, and clusters the segments using density-based clustering. ...
doi:10.1145/2808204
fatcat:k6wk4d67czfxxk5hvjobdvqs4y
Speaker segmentation and clustering
2008
Signal Processing
Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. ...
Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. ...
This work has been supported by the "PYTHAGORAS II" Programme, funded in part by the European Union (75%) and in part by the Hellenic Ministry of Education and Religious Affairs (25%). M. ...
doi:10.1016/j.sigpro.2007.11.017
fatcat:xjh52tmotfa45j6d5b25aqzgjq
Robust Target Speaker Tracking in Broadcast TV Streams
2006
International Journal of Computational Linguistics and Chinese Language Processing
This paper addresses the problem of audio change detection and speaker tracking in broadcast TV streams. ...
A two-pass audio change detection algorithm, which includes detection of the potential change boundaries and refinement, is proposed. ...
Acknowledgement Support provided by the National Natural Science Foundation of China (NSFC) under grant no. 60475014 and the National Hi-tech Research Plan under grant no. 2005AA114130 is gratefully acknowledged ...
dblp:journals/ijclclp/BaiJZZX06
fatcat:cl6fvfilrzhutgrfqkzpr3ihjy
Joint Segmentation and Classification of Time Series Using Class-Specific Features
2004
IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)
We present an approach for the joint segmentation and classification of a time series. ...
There is similarly no need for an a-priori specification of the number of sections, as the approach uses an appropriate penalization of an over-zealous segmentation. The scheme has two stages. ...
on CS for penalization of model complexity and an extra MDL-like term for the number of segments. ...
doi:10.1109/tsmcb.2003.819486
pmid:15376851
fatcat:jnbfx66gt5ck3egbwezazpgzlu
Agglomerative information bottleneck for speaker diarization of meetings data
2007
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
In contrary to the state-of-the-art diarization systems that models individual speakers with Gaussian Mixture Models, the proposed algorithm is completely non parametric . ...
Both clustering and model selection issues of nonparametric models are addressed in this work. The proposed algorithm is evaluated on meeting data on the RT06 evaluation data set. ...
Xavier Anguera for their help with baseline system and beam-forming toolkit. Authors also would like to thank Dr. John Dines for his help with the speech/non-speech segmentation ...
doi:10.1109/asru.2007.4430119
dblp:conf/asru/VijayasenanVB07
fatcat:tparlf5o55bevluhltinxc6kau
Audio segmentation-by-classification approach based on factor analysis in broadcast news domain
2014
EURASIP Journal on Audio, Speech, and Music Processing
This paper studies a novel audio segmentation-by-classification approach based on factor analysis. ...
The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others. ...
Acknowledgements This work has been funded by the Spanish Government and the European Union (FEDER) under the project TIN2011-28169-C05-02. ...
doi:10.1186/s13636-014-0034-5
fatcat:zxkjge4xxnhufoaquhjfmd2gpi
Audio segmentation-by-classification approach based on factor analysis in broadcast news domain
2014
EURASIP Journal on Audio, Speech, and Music Processing
This paper studies a novel audio segmentation-by-classification approach based on factor analysis. ...
The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others. ...
Acknowledgements This work has been funded by the Spanish Government and the European Union (FEDER) under the project TIN2011-28169-C05-02. ...
doi:10.1186/preaccept-1330210582123399
fatcat:i7orhxn4rjcv7iyee2r4yfuyou
Voice Activity Detection Using Generalized Gamma Distribution
[chapter]
2006
Lecture Notes in Computer Science
In this work, we model speech samples with a two-sided generalized Gamma distribution and evaluate its efficiency for voice activity detection. ...
Using a computationally inexpensive maximum likelihood approach, we employ the Bayesian Information Criterion for identifying the phoneme boundaries in noisy speech. ...
In many cases voice activity detection (VAD), endpoint detection, speaker segmentation, and audio classification can be seen as similar problems and they share a common methodology. ...
doi:10.1007/11752912_3
fatcat:c2uah6z5ezcivitktpgfrhbf2e
An Information Theoretic Approach to Speaker Diarization of Meeting Data
2009
IEEE Transactions on Audio, Speech, and Language Processing
A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the Information Bottleneck (IB) principle. ...
We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved ...
Wooters and Dr. X. Anguera for their help with baseline system and beam-forming toolkit. They would also like to thank Dr. J. Dines for his help with the speech/non-speech segmentation and Dr. P. ...
doi:10.1109/tasl.2009.2015698
fatcat:vjjbh27fwnep3aoyikob5h5p7q
Data Discovery Using Lossless Compression-Based Sparse Representation
[article]
2021
arXiv
pre-print
Sparse representation has been widely used in data compression, signal and image denoising, dimensionality reduction and computer vision. ...
In this paper, we propose a data-driven sparse representation using orthonormal bases under the lossless compression constraint. ...
MDL has also been previously used for sparse lossless audio compression [17] and dictionary learning [18] as well. In the next section, we summarize the MDL results that are used in this paper. ...
arXiv:2103.08765v2
fatcat:3rbwrsczlze3hhia4ie5blp4ue
A comprehensive study of visual event computing
2010
Multimedia tools and applications
Later, we review an extensive set of papers taken from well-known conferences and journals in multiple disciplines. We analyze events, and summarize the procedure of visual event actions. ...
We start by presenting events and their classifications, and continue with discussing the problem of capturing events in terms of photographs, videos, etc, as well as the methodologies for event storing ...
Paul Miller, and Dr. Xiwu Gu etc. This work was partially supported by QUB research project: Unusual event detection in audio-visual surveillance for public transport (NO.D6223EEC). ...
doi:10.1007/s11042-010-0560-9
fatcat:ak6u3eefefgjhmbpr7asru3n7u
On Interpretable Approaches to Cluster, Classify and Represent Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion Theory
[article]
2023
arXiv
pre-print
., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion ...
These are derived based on the lossy data coding and compression framework from the principle of rate distortion in information theory. ...
And we would also like to thank the anonymous reviewers for their comments and suggestions. ...
arXiv:2302.10383v1
fatcat:unkuzwi42zalpgf73b7qp3v54i
Using audio and video features to classify the most dominant person in a group meeting
2007
Proceedings of the 15th international conference on Multimedia - MULTIMEDIA '07
In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues. ...
We show that by using a simple model for dominance estimation we can obtain promising results. ...
AMIDA-30), the Swiss NCCR IM2, and the German Academic Exchange Service (DAAD). We thank Bastien Crettol (IDIAP) for his support with data annotation. ...
doi:10.1145/1291233.1291423
dblp:conf/mm/HungJYFBORMG07
fatcat:5xeoo2f6l5hqppghts4pf6crsm
Exploring probabilistic localized video representation for human action recognition
2011
Multimedia tools and applications
Specifically, the proposed representation encodes the visual and motion information of an ensemble of local ST features of a video into a distribution estimated by a generative probabilistic model. ...
The codebook needs to be re-built when video corpus is changed. To tackle these issues, this paper explores a localized, continuous and probabilistic video representation. ...
(2) the number of mixture components in GMM plays an important role and our proposed MDL criterion based model selection gives good performance. ...
doi:10.1007/s11042-011-0748-7
fatcat:wn3h7glsb5eercbp27qwro4nqe
Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion
2008
Speech Communication
In this work, we present a text-independent automatic phone segmentation algorithm based on the Bayesian Information Criterion. ...
In order to alleviate this problem and detect the phone boundaries accurately, we employ an information criterion corrected for small samples while modelling speech samples with the generalised Gamma distribution ...
For an accurate segmentation, the frame shift should be as small as possible. Such an acoustic change detection system based on BIC has been proposed by Chen and Gopalakrishnam (1998) . ...
doi:10.1016/j.specom.2007.06.005
fatcat:i4dxkivpwzgybiuvrxsoymsemu
« Previous
Showing results 1 — 15 out of 159 results