Multi-Frame Self-Supervised Depth with Transformers

Vitor Guizilini; Rares Ambrus; Dian Chen; Sergey Zakharov; Adrien   Gaidon

by Vitor Guizilini, Rares Ambrus, Dian Chen, Sergey Zakharov, Adrien Gaidon

Released as a article .

2022

Abstract

Multi-frame depth estimation improves over single-frame approaches by also leveraging geometric relationships between images via feature matching, in addition to learning appearance-based features. In this paper we revisit feature matching for self-supervised monocular depth estimation, and propose a novel transformer architecture for cost volume generation. We use depth-discretized epipolar sampling to select matching candidates, and refine predictions through a series of self- and cross-attention layers. These layers sharpen the matching probability between pixel features, improving over standard similarity metrics prone to ambiguities and local minima. The refined cost volume is decoded into depth estimates, and the whole pipeline is trained end-to-end from videos using only a photometric objective. Experiments on the KITTI and DDAD datasets show that our DepthFormer architecture establishes a new state of the art in self-supervised monocular depth estimation, and is even competitive with highly specialized supervised single-frame architectures. We also show that our learned cross-attention network yields representations transferable across datasets, increasing the effectiveness of pre-training strategies. Project page: https://sites.google.com/tri.global/depthformer
In text/plain format

Archived Files and Locations

application/pdf 13.5 MB
file_xwzktkwzdfgptchhlhxgnmmdfi arxiv.org (repository)
web.archive.org (webarchive)

Read Archived PDF

Preserved and Accessible

Type article
Stage

submitted

Date 2022-06-10
Version v2
Language en ^?

arXiv 2204.07616v2

Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)

Cite This

BibTeX
CSL-JSON
MLA
Harvard

Lookup Links

Worldcat
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar

Catalog Record
Revision: b8a8008e-d7a9-4666-b717-a6a161d559e0
API URL: JSON

Edit Metadata View History

Multi-Frame Self-Supervised Depth with Transformers release_bfmkbp2es5fx3mmjlkf2ske3ga

Abstract

Archived Files and Locations

Multi-Frame Self-Supervised Depth with Transformers `release_bfmkbp2es5fx3mmjlkf2ske3ga`