Visually Supervised Speaker Detection and Localization via Microphone Array.

Our solution extends the audio front-end using a microphone array. ... Monaural audio may successfully detect the presence of speech activity but fails in localizing the speaker due to the lack of spatial cues. ... ACKNOWLEDGMENT Thanks to Marco Volino, Mohd Azri Mohd Izhar, Hansung Kim, Charles Malleson and actors for audio-visual recordings. ...

arXiv:2203.03291v1 fatcat:szqx5vlthrbe7lkslyyxwsphee

., +, TASLP 2020 1755-1766 Focusing and Frequency Smoothing for Arbitrary Arrays With Application to Speaker Localization. ... Jo, B., +, TASLP 2020 1692-1705 Focusing and Frequency Smoothing for Arbitrary Arrays With Application to Speaker Localization. ... T Target tracking Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. Zhang, Q., +, TASLP 2020 1183 -1197 ...

doi:10.1109/taslp.2021.3055391 fatcat:7vmstynfqvaprgz6qy3ekinkt4

array and a video camera. ... We irst perform audiovisual calibration via camera resectioning, audio-visual temporal alignment and geometrical alignment to jointly use the features in the audio and video streams, which are independently ... As our camera has its own built-in microphone, we only need to detect the time ofset between the audio sequences from the array microphone and the camera microphone. ...

doi:10.1145/3123266.3123412 dblp:conf/mm/Sanchez-Matilla17 fatcat:gd4iptvcizb7dnjdi47cpfv5qu

Hasegawa-Johnson, and S. Thomas Multiple Acoustic Source Localization in Microphone Array Networks. . . . . ..J. Yang, X. Zhong, W. Chen, and W. ... Liu, and H. Meng Towards Duration Robust Weakly Supervised Sound Event Detection . . . . . . . . . . . . . . . . . . . . H. Dinkel, M. Wu, and K. ...

doi:10.1109/taslp.2021.3137064 fatcat:rpka3f2bhjh37c7pkhiowyndhm

than facing the cameras and the microphones. ... The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment ... They combine voice activity detection with sound-source localization using a linear microphone array which provides horizontal (azimuth) speech directions. ...

doi:10.1109/tpami.2017.2648793 pmid:28103192 fatcat:cn6tcdf5n5dp7leyrrrevtxln4

Multiple Versions

Our system for localizing sound source objects in the image is composed of audio and visual DNNs. The visual DNN is trained to localize sound source candidates within an input image. ... We also demonstrate that the visual DNN detected objects including talking visitors and specific exhibits from real data recorded in a science museum. ... Yusuke Date and Dr. Yu Hoshina for their support in the experiment in Miraikan. This study was partially supported by JSPS KAKENHI No. 18H06490 for funding. ...

arXiv:2007.13976v1 fatcat:k4sho4ggnbafbfc3wyhngmg76a

The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, and inclusive pagination. ... -that appeared in this periodical during 2021, and items from previous years that were commented upon or corrected in 2021. ... ., +, TASLP 2021 1864-1880 Multiple Acoustic Source Localization in Microphone Array Networks. ...

doi:10.1109/taslp.2022.3147096 fatcat:7nl52k7sjfalbhpxtum3y5nmje

Supervised learning based methods for source localization, being data driven, can be adapted to different acoustic conditions via training and have been shown to be robust to adverse acoustic environments ... Through additional empirical investigation, it is also shown that with an array of M microphones our proposed framework yields the best localization performance with M-1 convolution layers. ... For the eight microphone array, 6 CNNs are trained, whereas for the six microphones and the four microphone array, 4 and 2 CNNs are trained, respectively. ...

arXiv:1807.11722v1 fatcat:xv5uuacz6zdlvdmitxvtzxdzam

Microphone Arrays. ... ., +, TASLP April 2019 679-691 On Mainlobe Orientation of the First-and Second-Order Differential Microphone Arrays. ...

doi:10.1109/taslp.2020.2971902 fatcat:j66uwjyqlfbmtgda6zhzlswpva

In this paper, we investigate design, analysis, and testing of acoustic arrays for localizing acorn woodpeckers using their vocalizations. 1, 2 Each acoustic array consists of four microphones arranged ... Woodpecker localization experiments using robust array element spacing in different types of environments are conducted and compared. ... We appreciate the assistance of Kathy Griffith, Joe Wise, Chih-Kai Chen, and Hyunggon Park in conducting various experiments. ...

doi:10.1117/12.617983 fatcat:j43b62vfhfbg3kh5umxspk4r24

We here cast the diarization problem into a tracking formulation whereby the active speaker is detected and tracked over time. ... A probabilistic tracker exploits the on-image (spatial) coincidence of visual and auditory observations and infers a single latent variable which represents the identity of the active speaker. ... They combine voice activity detection with sound-source local-ization using a linear microphone array. The latter can only provide the azimuth (horizontal) sound direction. ...

doi:10.1109/iccvw.2015.96 dblp:conf/iccvw/GebruBEH15 fatcat:lruasrz6sfgn7imwdwdd7gne2y

in Microphone Array Networks.......J. ... Chin Towards Duration Robust Weakly Supervised Sound Event Detection . . . . . . . . . . . . . . ......H. Dinkel, M. Wu, and K. ... Speech Enhancement and Separation ...

doi:10.1109/taslp.2021.3137066 fatcat:ocit27xwlbagtjdyc652yws4xa

Data-Driven Localization and Tracking Learning-based approaches have been proposed for both microphone array and binaural localization. ... We present two localization algorithms that were designed for a single microphone array of two microphones. ...

doi:10.1561/2000000098 fatcat:a7et5bmprvcvxajwsx73j3lywy

In this paper we present an integrated robotic system capable of participating in and performing a wide range of educational and entertainment tasks, in collaboration with one or more children. ... The system, called ChildBot, features multimodal perception modules and multiple robotic agents that monitor the interaction environment, and can robustly coordinate complex Child-Robot Interaction use-cases ... Asimenia Papoulidi for their help in designing the use-cases, supervising and evaluating the experiments with the children, and their useful remarks. ...

arXiv:2008.12818v1 fatcat:au33jpbqpnfr5foaiivmd76n3y

Microphones and cameras have been extensively used to observe and detect human activity and to facilitate natural modes of interaction between humans and intelligent systems. ... Intelligent systems with audio-visual sensors should be capable of achieving similar goals. The audio-visual information fusion strategy is a key component in designing such systems. ... ACKNOWLEDGMENT We would like to thank our main sponsors, CALIT2 at UC San Diego, NSF's RESCUE project and the UC Discovery program. ...

doi:10.1109/jproc.2010.2057231 fatcat:lfzgfmn2hjdq7h6o5txva3oapq

Visually Supervised Speaker Detection and Localization via Microphone Array [article]

Preserved Fulltext

2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28

Preserved Fulltext

Multi-Modal Localization and Enhancement of Multiple Sound Sources from a Micro Aerial Vehicle

Preserved Fulltext

Table of Contents

Preserved Fulltext

Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion

Preserved Fulltext

Other Versions

Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling [article]

Preserved Fulltext

2021 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 29

Preserved Fulltext

Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals [article]

Preserved Fulltext

2019 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 27

Preserved Fulltext

Acoustic sensor networks for woodpecker localization

Preserved Fulltext

Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model

Preserved Fulltext

Table of Contents

Preserved Fulltext

Data-Driven Multi-Microphone Speaker Localization on Manifolds

Preserved Fulltext

ChildBot: Multi-Robot Perception and Interaction with Children [article]

Preserved Fulltext

Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A Survey

Preserved Fulltext