Vision-Guided Robot Hearing
release_xu4igqxl6zaj5bjka4ja7yta6m
by
Xavier Alameda-Pineda, Radu Horaud
2013
Abstract
Natural human-robot interaction in complex and unpredictable environments is
one of the main research lines in robotics. In typical real-world scenarios,
humans are at some distance from the robot and the acquired signals are
strongly impaired by noise, reverberations and other interfering sources. In
this context, the detection and localisation of speakers plays a key role since
it is the pillar on which several tasks (e.g.: speech recognition and speaker
tracking) rely. We address the problem of how to detect and localize people
that are both seen and heard by a humanoid robot. We introduce a hybrid
deterministic/probabilistic model. Indeed, the deterministic component allows
us to map the visual information into the auditory space. By means of the
probabilistic component, the visual features guide the grouping of the auditory
features in order to form AV objects. The proposed model and the associated
algorithm are implemented in real-time (17 FPS) using a stereoscopic camera
pair and two microphones embedded into the head of the humanoid robot NAO. We
performed experiments on (i) synthetic data, (ii) a publicly available data set
and (iii) data acquired using the robot. The results we obtained validate the
approach and encourage us to further investigate how vision can help robot
hearing.
In text/plain
format
Archived Files and Locations
application/pdf 3.6 MB
file_truzzogugvcxfcs6l26gawpxjm
|
arxiv.org (repository) web.archive.org (webarchive) |
1311.2460v2
access all versions, variants, and formats of this works (eg, pre-prints)