Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues
release_ge725spuinhs7asodxkskts2xq
by
Jianrong Wang and Ge Zhang and Zhenyu Wu and XueWei Li and Li Liu
2020
Abstract
In self-supervised monocular depth estimation, the depth discontinuity and
motion objects' artifacts are still challenging problems. Existing
self-supervised methods usually utilize a single view to train the depth
estimation network. Compared with static views, abundant dynamic properties
between video frames are beneficial to refined depth estimation, especially for
dynamic objects. In this work, we propose a novel self-supervised joint
learning framework for depth estimation using consecutive frames from monocular
and stereo videos. The main idea is using an implicit depth cue extractor which
leverages dynamic and static cues to generate useful depth proposals. These
cues can predict distinguishable motion contours and geometric scene
structures. Furthermore, a new high-dimensional attention module is introduced
to extract clear global transformation, which effectively suppresses
uncertainty of local descriptors in high-dimensional space, resulting in a more
reliable optimization in learning framework. Experiments demonstrate that the
proposed framework outperforms the state-of-the-art(SOTA) on KITTI and Make3D
datasets.
In text/plain
format
Archived Files and Locations
application/pdf 1.1 MB
file_3iwipza7arf5bjhen2n25a2bem
|
arxiv.org (repository) web.archive.org (webarchive) |
2006.09876v1
access all versions, variants, and formats of this works (eg, pre-prints)