A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Detecting speaking persons in video
[article]
2021
arXiv
pre-print
We present a novel method for detecting speaking persons in video, by extracting facial landmarks with a neural network and analysing these landmarks statistically over time ...
We therefore present a novel robust method for detecting speaking persons in video which relies only on the visual information. ...
DETECTOR ALGORITHM For each person occurring in the video, we do now a statistical analysis of its facial landmark trajectory over time in order to infer whether that person is speaking or not and to detect ...
arXiv:2110.13806v1
fatcat:7a3kckl76ncxhmkr25kgee5hom
Detecting speaking persons in video
2021
Zenodo
We present a novel method for detecting speaking persons in video, by extracting facial landmarks with a neural network and analysing these landmarks statistically over time. ...
We therefore present a novel robust method for detecting speaking persons in video which relies only on the visual information. ...
DETECTOR ALGORITHM For each person occurring in the video, we do now a statistical analysis of its facial landmark trajectory over time in order to infer whether that person is speaking or not and to detect ...
doi:10.5281/zenodo.5596861
fatcat:lhaeknyxjng3bnkbym7j4oirqu
Study of detecting behavioral signatures within DeepFake videos
[article]
2022
arXiv
pre-print
from a different person speaking the same utterance. ...
We conduct a study by comparing synthetic imagery that: 1) originates from a different person speaking a different utterance, 2) originates from the same person speaking a different utterance, and 3) originates ...
This provides evidence that the distinct speaking style of a person and the correspondence between speaking behavior and utterance can serve as important clues for DeepFake detection even for synthetic ...
arXiv:2208.03561v1
fatcat:ej2fu3bk6vfudgrw7vf3ru66kq
Cross-modal Supervision for Learning Active Speaker Detection in Video
[article]
2016
arXiv
pre-print
In this paper, we show how to use audio to supervise the learning of active speaker detection in video. ...
We further improve a generic model for active speaker detection by learning person specific models. ...
person is speaking, among the people in the frame, and at the same time, learn the video-based classifier for active speaker detection. ...
arXiv:1603.08907v1
fatcat:fhq443dmvzbc3ckgfxlfdwz5ga
Detecting Utterance Scenes of a Specific Person
2018
Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion - IUI'18
We propose a system that detects the scene, where a specific speaker is speaking in the video, and displays the site as a heat map in the video's timeline. ...
This system enables users to skip to the timeline they want to hear by detecting scenes in a drama, talk show, or discussion TV program, where a specific speaker is speaking. ...
CONCLUSION We propose a system that detects scenes, where a specific person speaks in the video, and displays them in the timeline. ...
doi:10.1145/3180308.3180323
dblp:conf/iui/SatoR18
fatcat:j3esrwvgz5hotaky53rmtezgy4
Visual Speech Detection Using Mouth Region Intensities
2006
Zenodo
Publication in the conference proceedings of EUSIPCO, Florence, Italy, 2006 ...
The proposed algorithm exploits the attributes that a video sequence of a speaking person exhibits. ...
The rectangle encompasses the frames where the person is speaking global threshold for all videos, but a video specific threshold, computed prior to the analysis of each video sequence. ...
doi:10.5281/zenodo.40178
fatcat:laqfxmdlnnfc3ccqcdj3io7dai
No-Audio Multimodal Speech Detection Task at MediaEval 2020
2020
MediaEval Benchmarking Initiative for Multimedia Evaluation
In contrast to conventional speech detection approaches, no audio is used for this task. ...
Similar to the previous two editions, the participants of this task are encouraged to estimate the speaking status (i.e. person speaking or not) of individuals interacting freely during a crowded mingle ...
For the video modality, the input will be a video of a person interacting freely in a social gathering (see Figure 1 ), and a estimation of that persons' speaking status (speaking/non-speaking) should ...
dblp:conf/mediaeval/QuirosQH20
fatcat:wtxqakoqpfg5nggvmjt3eq2dwa
Active speaker detection with audio-visual co-training
2016
Proceedings of the 18th ACM International Conference on Multimodal Interaction - ICMI 2016
First, audio Voice Activity Detection (VAD) is used to train a personalized video-based active speaker classifier in a weakly supervised fashion. ...
The video classifier is in turn used to train a voice model for each person. The individual voice models are then used to detect active speakers. ...
In this paper, we use the above video-based person-specific active speaker detection models to train personalized audio voice models. ...
doi:10.1145/2993148.2993172
dblp:conf/icmi/ChakravartyZTh16
fatcat:tq6eopjv65anzikcnw5jmbusvy
Rethinking Audio-visual Synchronization for Active Speaker Detection
[article]
2022
arXiv
pre-print
videos as active speaking. ...
Experimental results suggest that our model can successfully detect unsynchronized speaking as not speaking, addressing the limitation of current models. ...
INTRODUCTION Active speaker detection (ASD) is to determine which person or none is speaking in a video at each time instant. ...
arXiv:2206.10421v2
fatcat:hitj3gfmqbhclp24rc55h3e3wy
No-Audio Multimodal Speech Detection in Crowded Social Settings Task at MediaEval 2018
2018
MediaEval Benchmarking Initiative for Multimedia Evaluation
The goal of this task is to automatically estimate if a person is speaking or not using these two alternative modalities. ...
In its first edition, the HBA task focuses on analyzing one of the most basic elements of social behavior: the estimation of speaking status. ...
For the video modality, the algorithm will have a video of a person interacting freely in a social gathering (see Figure 1 ) as input and should provide a estimation of that persons' speaking status ( ...
dblp:conf/mediaeval/QuirosGH18
fatcat:zmdvzorupbbc3eqkmavnvxqxdq
Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities
2009
IEEE transactions on circuits and systems for video technology (Print)
Furthermore, we employ the lip activity detection method in order to determine the active speaker(s) in a multi-person environment. 1 ...
We argue that the increased average value and standard deviation of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting ...
In the vast majority of the speaker detection works, the authors consider only video data where strictly one person is speaking at a time; the algorithms have not been tested on simultaneous speech cases ...
doi:10.1109/tcsvt.2008.2009262
fatcat:htnt7zgcxrdvdnubk3solisfyu
Talking faces indexing in TV-content
2010
2010 International Workshop on Content Based Multimedia Indexing (CBMI)
In TV-content, because of multi-face shots and non-speaking face shots, it is difficult to determine which face is speaking. ...
In this work, a method is proposed which clusters people independently by the audio and by the visual information and combines these clusterings of people (audio and visual) in order to detect sequences ...
In particular, we are interested in locating sequences in popular TV-programs in which a certain person is speaking and visible. ...
doi:10.1109/cbmi.2010.5529907
dblp:conf/cbmi/BendrisCC10
fatcat:mklebuyipnfd7f67de77uhymhy
Using audio and video features to classify the most dominant person in a group meeting
2007
Proceedings of the 15th international conference on Multimedia - MULTIMEDIA '07
In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues. ...
A novel area of multi-modal data labelling, which has received relatively little attention, is the automatic estimation of the most dominant person in a group meeting. ...
Table 1: Most dominant person detection results. ...
doi:10.1145/1291233.1291423
dblp:conf/mm/HungJYFBORMG07
fatcat:5xeoo2f6l5hqppghts4pf6crsm
Real-Time Feedback System for Monitoring and Facilitating Discussions
[chapter]
2013
Natural Interaction with Robots, Knowbots and Smartphones
Various speech statistics, such as speaking length, speaker turns, and speaking turn duration, are computed and displayed in real-time. ...
In contrast, our system analyses the speakers and provides feedback to the speakers in real-time during the discussion, which is a novel approach with plenty of potential applications. ...
ACKNOWLEDGMENTS This research project is supported in part by the Institute for Media Innovation (Seed Grant M4080824) and the Nanyang Business School, both at Nanyang Technological University (NTU), Singapore ...
doi:10.1007/978-1-4614-8280-2_34
fatcat:gn65gw6j7jcfjdpfphwfta3apu
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
[article]
2021
arXiv
pre-print
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers. ...
TalkNet consists of audio and visual temporal encoders for feature representation, audio-visual cross-attention mechanism for inter-modality interaction, and a self-attention mechanism to capture long-term speaking ...
INTRODUCTION Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers [34] . ...
arXiv:2107.06592v1
fatcat:buxzp5pyabgabo6wxslbjkrwba
« Previous
Showing results 1 — 15 out of 93,991 results