Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








93,991 Hits in 2.3 sec

Detecting speaking persons in video [article]

Hannes Fassold
2021 arXiv   pre-print
We present a novel method for detecting speaking persons in video, by extracting facial landmarks with a neural network and analysing these landmarks statistically over time  ...  We therefore present a novel robust method for detecting speaking persons in video which relies only on the visual information.  ...  DETECTOR ALGORITHM For each person occurring in the video, we do now a statistical analysis of its facial landmark trajectory over time in order to infer whether that person is speaking or not and to detect  ... 
arXiv:2110.13806v1 fatcat:7a3kckl76ncxhmkr25kgee5hom

Detecting speaking persons in video

Hannes Fassold
2021 Zenodo  
We present a novel method for detecting speaking persons in video, by extracting facial landmarks with a neural network and analysing these landmarks statistically over time.  ...  We therefore present a novel robust method for detecting speaking persons in video which relies only on the visual information.  ...  DETECTOR ALGORITHM For each person occurring in the video, we do now a statistical analysis of its facial landmark trajectory over time in order to infer whether that person is speaking or not and to detect  ... 
doi:10.5281/zenodo.5596861 fatcat:lhaeknyxjng3bnkbym7j4oirqu

Study of detecting behavioral signatures within DeepFake videos [article]

Qiaomu Miao, Sinhwa Kang, Stacy Marsella, Steve DiPaola, Chao Wang, Ari Shapiro
2022 arXiv   pre-print
from a different person speaking the same utterance.  ...  We conduct a study by comparing synthetic imagery that: 1) originates from a different person speaking a different utterance, 2) originates from the same person speaking a different utterance, and 3) originates  ...  This provides evidence that the distinct speaking style of a person and the correspondence between speaking behavior and utterance can serve as important clues for DeepFake detection even for synthetic  ... 
arXiv:2208.03561v1 fatcat:ej2fu3bk6vfudgrw7vf3ru66kq

Cross-modal Supervision for Learning Active Speaker Detection in Video [article]

Punarjay Chakravarty, Tinne Tuytelaars
2016 arXiv   pre-print
In this paper, we show how to use audio to supervise the learning of active speaker detection in video.  ...  We further improve a generic model for active speaker detection by learning person specific models.  ...  person is speaking, among the people in the frame, and at the same time, learn the video-based classifier for active speaker detection.  ... 
arXiv:1603.08907v1 fatcat:fhq443dmvzbc3ckgfxlfdwz5ga

Detecting Utterance Scenes of a Specific Person

Kunihiko Sato, Jun Rekimoto
2018 Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion - IUI'18  
We propose a system that detects the scene, where a specific speaker is speaking in the video, and displays the site as a heat map in the video's timeline.  ...  This system enables users to skip to the timeline they want to hear by detecting scenes in a drama, talk show, or discussion TV program, where a specific speaker is speaking.  ...  CONCLUSION We propose a system that detects scenes, where a specific person speaks in the video, and displays them in the timeline.  ... 
doi:10.1145/3180308.3180323 dblp:conf/iui/SatoR18 fatcat:j3esrwvgz5hotaky53rmtezgy4

Visual Speech Detection Using Mouth Region Intensities

N. Nikolaidis, I. Pitas, Spyridon Siatras
2006 Zenodo  
Publication in the conference proceedings of EUSIPCO, Florence, Italy, 2006  ...  The proposed algorithm exploits the attributes that a video sequence of a speaking person exhibits.  ...  The rectangle encompasses the frames where the person is speaking global threshold for all videos, but a video specific threshold, computed prior to the analysis of each video sequence.  ... 
doi:10.5281/zenodo.40178 fatcat:laqfxmdlnnfc3ccqcdj3io7dai

No-Audio Multimodal Speech Detection Task at MediaEval 2020

Laura Cabrera Quiros, Jose Vargas Quiros, Hayley Hung
2020 MediaEval Benchmarking Initiative for Multimedia Evaluation  
In contrast to conventional speech detection approaches, no audio is used for this task.  ...  Similar to the previous two editions, the participants of this task are encouraged to estimate the speaking status (i.e. person speaking or not) of individuals interacting freely during a crowded mingle  ...  For the video modality, the input will be a video of a person interacting freely in a social gathering (see Figure 1 ), and a estimation of that persons' speaking status (speaking/non-speaking) should  ... 
dblp:conf/mediaeval/QuirosQH20 fatcat:wtxqakoqpfg5nggvmjt3eq2dwa

Active speaker detection with audio-visual co-training

Punarjay Chakravarty, Jeroen Zegers, Tinne Tuytelaars, Hugo Van hamme
2016 Proceedings of the 18th ACM International Conference on Multimodal Interaction - ICMI 2016  
First, audio Voice Activity Detection (VAD) is used to train a personalized video-based active speaker classifier in a weakly supervised fashion.  ...  The video classifier is in turn used to train a voice model for each person. The individual voice models are then used to detect active speakers.  ...  In this paper, we use the above video-based person-specific active speaker detection models to train personalized audio voice models.  ... 
doi:10.1145/2993148.2993172 dblp:conf/icmi/ChakravartyZTh16 fatcat:tq6eopjv65anzikcnw5jmbusvy

Rethinking Audio-visual Synchronization for Active Speaker Detection [article]

Abudukelimu Wuerkaixi, You Zhang, Zhiyao Duan, Changshui Zhang
2022 arXiv   pre-print
videos as active speaking.  ...  Experimental results suggest that our model can successfully detect unsynchronized speaking as not speaking, addressing the limitation of current models.  ...  INTRODUCTION Active speaker detection (ASD) is to determine which person or none is speaking in a video at each time instant.  ... 
arXiv:2206.10421v2 fatcat:hitj3gfmqbhclp24rc55h3e3wy

No-Audio Multimodal Speech Detection in Crowded Social Settings Task at MediaEval 2018

Laura Cabrera Quiros, Ekin Gedik, Hayley Hung
2018 MediaEval Benchmarking Initiative for Multimedia Evaluation  
The goal of this task is to automatically estimate if a person is speaking or not using these two alternative modalities.  ...  In its first edition, the HBA task focuses on analyzing one of the most basic elements of social behavior: the estimation of speaking status.  ...  For the video modality, the algorithm will have a video of a person interacting freely in a social gathering (see Figure 1 ) as input and should provide a estimation of that persons' speaking status (  ... 
dblp:conf/mediaeval/QuirosGH18 fatcat:zmdvzorupbbc3eqkmavnvxqxdq

Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities

S. Siatras, N. Nikolaidis, M. Krinidis, I. Pitas
2009 IEEE transactions on circuits and systems for video technology (Print)  
Furthermore, we employ the lip activity detection method in order to determine the active speaker(s) in a multi-person environment. 1  ...  We argue that the increased average value and standard deviation of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting  ...  In the vast majority of the speaker detection works, the authors consider only video data where strictly one person is speaking at a time; the algorithms have not been tested on simultaneous speech cases  ... 
doi:10.1109/tcsvt.2008.2009262 fatcat:htnt7zgcxrdvdnubk3solisfyu

Talking faces indexing in TV-content

Meriem Bendris, Delphine Charlet, Gerard Chollet
2010 2010 International Workshop on Content Based Multimedia Indexing (CBMI)  
In TV-content, because of multi-face shots and non-speaking face shots, it is difficult to determine which face is speaking.  ...  In this work, a method is proposed which clusters people independently by the audio and by the visual information and combines these clusterings of people (audio and visual) in order to detect sequences  ...  In particular, we are interested in locating sequences in popular TV-programs in which a certain person is speaking and visible.  ... 
doi:10.1109/cbmi.2010.5529907 dblp:conf/cbmi/BendrisCC10 fatcat:mklebuyipnfd7f67de77uhymhy

Using audio and video features to classify the most dominant person in a group meeting

Hayley Hung, Dinesh Jayagopi, Chuohao Yeo, Gerald Friedland, Sileye Ba, Jean-Marc Odobez, Kannan Ramchandran, Nikki Mirghafori, Daniel Gatica-Perez
2007 Proceedings of the 15th international conference on Multimedia - MULTIMEDIA '07  
In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues.  ...  A novel area of multi-modal data labelling, which has received relatively little attention, is the automatic estimation of the most dominant person in a group meeting.  ...  Table 1: Most dominant person detection results.  ... 
doi:10.1145/1291233.1291423 dblp:conf/mm/HungJYFBORMG07 fatcat:5xeoo2f6l5hqppghts4pf6crsm

Real-Time Feedback System for Monitoring and Facilitating Discussions [chapter]

Sanat Sarda, Martin Constable, Justin Dauwels, Shoko Dauwels, Mohamed Elgendi, Zhou Mengyu, Umer Rasheed, Yasir Tahir, Daniel Thalmann, Nadia Magnenat-Thalmann
2013 Natural Interaction with Robots, Knowbots and Smartphones  
Various speech statistics, such as speaking length, speaker turns, and speaking turn duration, are computed and displayed in real-time.  ...  In contrast, our system analyses the speakers and provides feedback to the speakers in real-time during the discussion, which is a novel approach with plenty of potential applications.  ...  ACKNOWLEDGMENTS This research project is supported in part by the Institute for Media Innovation (Seed Grant M4080824) and the Nanyang Business School, both at Nanyang Technological University (NTU), Singapore  ... 
doi:10.1007/978-1-4614-8280-2_34 fatcat:gn65gw6j7jcfjdpfphwfta3apu

Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection [article]

Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li
2021 arXiv   pre-print
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.  ...  TalkNet consists of audio and visual temporal encoders for feature representation, audio-visual cross-attention mechanism for inter-modality interaction, and a self-attention mechanism to capture long-term speaking  ...  INTRODUCTION Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers [34] .  ... 
arXiv:2107.06592v1 fatcat:buxzp5pyabgabo6wxslbjkrwba
« Previous Showing results 1 — 15 out of 93,991 results