Detecting speaking persons in video. - Internet Archive Scholar

We present a novel method for detecting speaking persons in video, by extracting facial landmarks with a neural network and analysing these landmarks statistically over time ... We therefore present a novel robust method for detecting speaking persons in video which relies only on the visual information. ... DETECTOR ALGORITHM For each person occurring in the video, we do now a statistical analysis of its facial landmark trajectory over time in order to infer whether that person is speaking or not and to detect ...

arXiv:2110.13806v1 fatcat:7a3kckl76ncxhmkr25kgee5hom

Open Access

We present a novel method for detecting speaking persons in video, by extracting facial landmarks with a neural network and analysing these landmarks statistically over time. ... We therefore present a novel robust method for detecting speaking persons in video which relies only on the visual information. ... DETECTOR ALGORITHM For each person occurring in the video, we do now a statistical analysis of its facial landmark trajectory over time in order to infer whether that person is speaking or not and to detect ...

doi:10.5281/zenodo.5596861 fatcat:lhaeknyxjng3bnkbym7j4oirqu

Open Access

from a different person speaking the same utterance. ... We conduct a study by comparing synthetic imagery that: 1) originates from a different person speaking a different utterance, 2) originates from the same person speaking a different utterance, and 3) originates ... This provides evidence that the distinct speaking style of a person and the correspondence between speaking behavior and utterance can serve as important clues for DeepFake detection even for synthetic ...

arXiv:2208.03561v1 fatcat:ej2fu3bk6vfudgrw7vf3ru66kq

Open Access

In this paper, we show how to use audio to supervise the learning of active speaker detection in video. ... We further improve a generic model for active speaker detection by learning person specific models. ... person is speaking, among the people in the frame, and at the same time, learn the video-based classifier for active speaker detection. ...

arXiv:1603.08907v1 fatcat:fhq443dmvzbc3ckgfxlfdwz5ga

We propose a system that detects the scene, where a specific speaker is speaking in the video, and displays the site as a heat map in the video's timeline. ... This system enables users to skip to the timeline they want to hear by detecting scenes in a drama, talk show, or discussion TV program, where a specific speaker is speaking. ... CONCLUSION We propose a system that detects scenes, where a specific person speaks in the video, and displays them in the timeline. ...

doi:10.1145/3180308.3180323 dblp:conf/iui/SatoR18 fatcat:j3esrwvgz5hotaky53rmtezgy4

Publication in the conference proceedings of EUSIPCO, Florence, Italy, 2006 ... The proposed algorithm exploits the attributes that a video sequence of a speaking person exhibits. ... The rectangle encompasses the frames where the person is speaking global threshold for all videos, but a video specific threshold, computed prior to the analysis of each video sequence. ...

doi:10.5281/zenodo.40178 fatcat:laqfxmdlnnfc3ccqcdj3io7dai

Open Access

In contrast to conventional speech detection approaches, no audio is used for this task. ... Similar to the previous two editions, the participants of this task are encouraged to estimate the speaking status (i.e. person speaking or not) of individuals interacting freely during a crowded mingle ... For the video modality, the input will be a video of a person interacting freely in a social gathering (see Figure 1 ), and a estimation of that persons' speaking status (speaking/non-speaking) should ...

dblp:conf/mediaeval/QuirosQH20 fatcat:wtxqakoqpfg5nggvmjt3eq2dwa

First, audio Voice Activity Detection (VAD) is used to train a personalized video-based active speaker classifier in a weakly supervised fashion. ... The video classifier is in turn used to train a voice model for each person. The individual voice models are then used to detect active speakers. ... In this paper, we use the above video-based person-specific active speaker detection models to train personalized audio voice models. ...

doi:10.1145/2993148.2993172 dblp:conf/icmi/ChakravartyZTh16 fatcat:tq6eopjv65anzikcnw5jmbusvy

videos as active speaking. ... Experimental results suggest that our model can successfully detect unsynchronized speaking as not speaking, addressing the limitation of current models. ... INTRODUCTION Active speaker detection (ASD) is to determine which person or none is speaking in a video at each time instant. ...

arXiv:2206.10421v2 fatcat:hitj3gfmqbhclp24rc55h3e3wy

Multiple Versions

The goal of this task is to automatically estimate if a person is speaking or not using these two alternative modalities. ... In its first edition, the HBA task focuses on analyzing one of the most basic elements of social behavior: the estimation of speaking status. ... For the video modality, the algorithm will have a video of a person interacting freely in a social gathering (see Figure 1 ) as input and should provide a estimation of that persons' speaking status ( ...

dblp:conf/mediaeval/QuirosGH18 fatcat:zmdvzorupbbc3eqkmavnvxqxdq

Furthermore, we employ the lip activity detection method in order to determine the active speaker(s) in a multi-person environment. 1 ... We argue that the increased average value and standard deviation of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting ... In the vast majority of the speaker detection works, the authors consider only video data where strictly one person is speaking at a time; the algorithms have not been tested on simultaneous speech cases ...

doi:10.1109/tcsvt.2008.2009262 fatcat:htnt7zgcxrdvdnubk3solisfyu

In TV-content, because of multi-face shots and non-speaking face shots, it is difficult to determine which face is speaking. ... In this work, a method is proposed which clusters people independently by the audio and by the visual information and combines these clusterings of people (audio and visual) in order to detect sequences ... In particular, we are interested in locating sequences in popular TV-programs in which a certain person is speaking and visible. ...

doi:10.1109/cbmi.2010.5529907 dblp:conf/cbmi/BendrisCC10 fatcat:mklebuyipnfd7f67de77uhymhy

In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues. ... A novel area of multi-modal data labelling, which has received relatively little attention, is the automatic estimation of the most dominant person in a group meeting. ... Table 1: Most dominant person detection results. ...

doi:10.1145/1291233.1291423 dblp:conf/mm/HungJYFBORMG07 fatcat:5xeoo2f6l5hqppghts4pf6crsm

Various speech statistics, such as speaking length, speaker turns, and speaking turn duration, are computed and displayed in real-time. ... In contrast, our system analyses the speakers and provides feedback to the speakers in real-time during the discussion, which is a novel approach with plenty of potential applications. ... ACKNOWLEDGMENTS This research project is supported in part by the Institute for Media Innovation (Seed Grant M4080824) and the Nanyang Business School, both at Nanyang Technological University (NTU), Singapore ...

doi:10.1007/978-1-4614-8280-2_34 fatcat:gn65gw6j7jcfjdpfphwfta3apu

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers. ... TalkNet consists of audio and visual temporal encoders for feature representation, audio-visual cross-attention mechanism for inter-modality interaction, and a self-attention mechanism to capture long-term speaking ... INTRODUCTION Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers [34] . ...

arXiv:2107.06592v1 fatcat:buxzp5pyabgabo6wxslbjkrwba

Multiple Versions

Detecting speaking persons in video [article]

Preserved Fulltext