Multiple change-point audio segmentation and classification using an MDL-based Gaussian model.

We found that it is critical to employ proper temporal segmentation and to leverage knowledge of spoken content to improve classification performance. ... For example, an individual's mouth movement may be similar when he smiles and when he pronounces the phoneme /IY/, as in "cheese". ... The method automatically segments time-series data based on rapid changes, and clusters the segments using density-based clustering. ...

doi:10.1145/2808204 fatcat:k6wk4d67czfxxk5hvjobdvqs4y

Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. ... Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. ... This work has been supported by the "PYTHAGORAS II" Programme, funded in part by the European Union (75%) and in part by the Hellenic Ministry of Education and Religious Affairs (25%). M. ...

doi:10.1016/j.sigpro.2007.11.017 fatcat:xjh52tmotfa45j6d5b25aqzgjq

This paper addresses the problem of audio change detection and speaker tracking in broadcast TV streams. ... A two-pass audio change detection algorithm, which includes detection of the potential change boundaries and refinement, is proposed. ... Acknowledgement Support provided by the National Natural Science Foundation of China (NSFC) under grant no. 60475014 and the National Hi-tech Research Plan under grant no. 2005AA114130 is gratefully acknowledged ...

dblp:journals/ijclclp/BaiJZZX06 fatcat:cl6fvfilrzhutgrfqkzpr3ihjy

We present an approach for the joint segmentation and classification of a time series. ... There is similarly no need for an a-priori specification of the number of sections, as the approach uses an appropriate penalization of an over-zealous segmentation. The scheme has two stages. ... on CS for penalization of model complexity and an extra MDL-like term for the number of segments. ...

doi:10.1109/tsmcb.2003.819486 pmid:15376851 fatcat:jnbfx66gt5ck3egbwezazpgzlu

In contrary to the state-of-the-art diarization systems that models individual speakers with Gaussian Mixture Models, the proposed algorithm is completely non parametric . ... Both clustering and model selection issues of nonparametric models are addressed in this work. The proposed algorithm is evaluated on meeting data on the RT06 evaluation data set. ... Xavier Anguera for their help with baseline system and beam-forming toolkit. Authors also would like to thank Dr. John Dines for his help with the speech/non-speech segmentation ...

doi:10.1109/asru.2007.4430119 dblp:conf/asru/VijayasenanVB07 fatcat:tparlf5o55bevluhltinxc6kau

This paper studies a novel audio segmentation-by-classification approach based on factor analysis. ... The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others. ... Acknowledgements This work has been funded by the Spanish Government and the European Union (FEDER) under the project TIN2011-28169-C05-02. ...

doi:10.1186/s13636-014-0034-5 fatcat:zxkjge4xxnhufoaquhjfmd2gpi

DOAJ Szczepanski

This paper studies a novel audio segmentation-by-classification approach based on factor analysis. ... The proposed method is applied to segment and classify audios coming from TV shows into five different acoustic classes: speech, music, speech with music, speech with noise, and others. ... Acknowledgements This work has been funded by the Spanish Government and the European Union (FEDER) under the project TIN2011-28169-C05-02. ...

doi:10.1186/preaccept-1330210582123399 fatcat:i7orhxn4rjcv7iyee2r4yfuyou

DOAJ Szczepanski

In this work, we model speech samples with a two-sided generalized Gamma distribution and evaluate its efficiency for voice activity detection. ... Using a computationally inexpensive maximum likelihood approach, we employ the Bayesian Information Criterion for identifying the phoneme boundaries in noisy speech. ... In many cases voice activity detection (VAD), endpoint detection, speaker segmentation, and audio classification can be seen as similar problems and they share a common methodology. ...

doi:10.1007/11752912_3 fatcat:c2uah6z5ezcivitktpgfrhbf2e

A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the Information Bottleneck (IB) principle. ... We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved ... Wooters and Dr. X. Anguera for their help with baseline system and beam-forming toolkit. They would also like to thank Dr. J. Dines for his help with the speech/non-speech segmentation and Dr. P. ...

doi:10.1109/tasl.2009.2015698 fatcat:vjjbh27fwnep3aoyikob5h5p7q

Sparse representation has been widely used in data compression, signal and image denoising, dimensionality reduction and computer vision. ... In this paper, we propose a data-driven sparse representation using orthonormal bases under the lossless compression constraint. ... MDL has also been previously used for sparse lossless audio compression [17] and dictionary learning [18] as well. In the next section, we summarize the MDL results that are used in this paper. ...

arXiv:2103.08765v2 fatcat:3rbwrsczlze3hhia4ie5blp4ue

Open Access Multiple Versions

Later, we review an extensive set of papers taken from well-known conferences and journals in multiple disciplines. We analyze events, and summarize the procedure of visual event actions. ... We start by presenting events and their classifications, and continue with discussing the problem of capturing events in terms of photographs, videos, etc, as well as the methodologies for event storing ... Paul Miller, and Dr. Xiwu Gu etc. This work was partially supported by QUB research project: Unusual event detection in audio-visual surveillance for public transport (NO.D6223EEC). ...

doi:10.1007/s11042-010-0560-9 fatcat:ak6u3eefefgjhmbpr7asru3n7u

., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion ... These are derived based on the lossy data coding and compression framework from the principle of rate distortion in information theory. ... And we would also like to thank the anonymous reviewers for their comments and suggestions. ...

arXiv:2302.10383v1 fatcat:unkuzwi42zalpgf73b7qp3v54i

Open Access

In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues. ... We show that by using a simple model for dominance estimation we can obtain promising results. ... AMIDA-30), the Swiss NCCR IM2, and the German Academic Exchange Service (DAAD). We thank Bastien Crettol (IDIAP) for his support with data annotation. ...

doi:10.1145/1291233.1291423 dblp:conf/mm/HungJYFBORMG07 fatcat:5xeoo2f6l5hqppghts4pf6crsm

Specifically, the proposed representation encodes the visual and motion information of an ensemble of local ST features of a video into a distribution estimated by a generative probabilistic model. ... The codebook needs to be re-built when video corpus is changed. To tackle these issues, this paper explores a localized, continuous and probabilistic video representation. ... (2) the number of mixture components in GMM plays an important role and our proposed MDL criterion based model selection gives good performance. ...

doi:10.1007/s11042-011-0748-7 fatcat:wn3h7glsb5eercbp27qwro4nqe

In this work, we present a text-independent automatic phone segmentation algorithm based on the Bayesian Information Criterion. ... In order to alleviate this problem and detect the phone boundaries accurately, we employ an information criterion corrected for small samples while modelling speech samples with the generalised Gamma distribution ... For an accurate segmentation, the frame shift should be as small as possible. Such an acoustic change detection system based on BIC has been proposed by Chen and Gopalakrishnam (1998) . ...

doi:10.1016/j.specom.2007.06.005 fatcat:i4dxkivpwzgybiuvrxsoymsemu

Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face

Preserved Fulltext

Speaker segmentation and clustering

Preserved Fulltext

Robust Target Speaker Tracking in Broadcast TV Streams

Preserved Fulltext

Joint Segmentation and Classification of Time Series Using Class-Specific Features

Preserved Fulltext

Agglomerative information bottleneck for speaker diarization of meetings data

Preserved Fulltext

Audio segmentation-by-classification approach based on factor analysis in broadcast news domain

Preserved Fulltext

Audio segmentation-by-classification approach based on factor analysis in broadcast news domain

Preserved Fulltext

Voice Activity Detection Using Generalized Gamma Distribution [chapter]

Preserved Fulltext

An Information Theoretic Approach to Speaker Diarization of Meeting Data

Preserved Fulltext

Data Discovery Using Lossless Compression-Based Sparse Representation [article]

Preserved Fulltext

Other Versions

A comprehensive study of visual event computing

Preserved Fulltext

On Interpretable Approaches to Cluster, Classify and Represent Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion Theory [article]

Preserved Fulltext

Using audio and video features to classify the most dominant person in a group meeting

Preserved Fulltext

Exploring probabilistic localized video representation for human action recognition

Preserved Fulltext

Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion

Preserved Fulltext