Self-Supervised Depth Estimation Via Implicit Cues from Videos.

In this work, we propose a novel self-supervised joint learning framework for depth estimation using consecutive frames from monocular and stereo videos. ... Existing self-supervised methods usually utilize a single view to train the depth estimation network. ... The proposed framework utilizes implicit cues extractor to extract static and dynamic depth cues from unit stream in shallow space, and uses implicit cues to guide the depth estimation of a single image ...

arXiv:2006.09876v3 fatcat:g4z3yoabbrbsvg3dsbdiesi5we

Multiple Versions

We achieve this by formulating a simple yet effective self-supervision task: our model is required to reconstruct a random frame of a video given a frame from another timepoint and a rendered image of ... We present preliminary results for a method to estimate 3D pose from 2D video containing a single person and a static background without the need for any manual landmark annotations. ... In this paper we focus on self-supervised 3D pose estimation from monocular video, a key element of a wide range of applications including motion capture, visual surveillance or gait analysis. ...

arXiv:2210.04514v1 fatcat:fpeaxmw4obghlm3dfbtciecpgu

Open Access

A key contributor to recent progress in 3D detection from single images is monocular depth estimation. ... Existing methods focus on how to leverage depth explicitly, by generating pseudo-pointclouds or providing attention cues for image features. ... ignore close" indicate a small trick to ignore closest depth estimation in self-supervised training.All methods start from a single initial model pretrained by large-scale depth supervision available from ...

arXiv:2210.02493v1 fatcat:7c3gw6nakfbyzl3wve2kwab3ee

Open Access

Using Implicit Cues from Image Tags I Ichimura, Naoyuki Workshop: GPU Computing with Orientation Maps for Extracting Local Invariant Features Igual, Laura Workshop: Aligning Endoluminal Scene ... Cues from Image Tags Far-Sighted Active Learning on a Budget for Image and Video Recognition Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images Learning a Hierarchy of ...

doi:10.1109/cvpr.2010.5539913 fatcat:y6m5knstrzfyfin6jzusc42p54

This paper is investigating learning methods using depth as a cue measurement that can be used for bridge inspection. ... We go over the state-of-the-art deep learning methods, including supervised and unsupervised methods. ... Acknowledgements This project was supported in part by collaborative research funding from the National Research Council of Canada's Artificial Intelligence for Logistics Program. ...

doi:10.5772/intechopen.1002466 fatcat:n64au53jjzbnzmvhyp3ak5rktm

Open Access

well as human mesh recovery, encompassing methods based on explicit models and implicit representations. ... 3D human pose estimation and mesh recovery have attracted widespread research interest in many areas, such as computer vision, autonomous driving, and robotics. ... and self-supervised 3D human pose and shape estimation. ...

arXiv:2402.18844v1 fatcat:hqfywkjouzbe3ifj36sdiidi2e

RGBD saliency detection model focuses on extracting the salient regions from RGBD images by combining the depth information. ... The goal of video saliency detection model is to locate the motion-related salient object in video sequences, which considers the motion cue and spatiotemporal constraint jointly. ... Depth Measure Based RGBD Saliency Detection In order to capture the comprehensive and implicit attributes from the depth map and enhance the identification of salient object, some depth measures, such ...

arXiv:1803.03391v2 fatcat:htcmhlo32jhczehvvq6nmgzwam

Multiple Versions

Our method, which significantly simplifies previous attempts at using motion for self-supervision, achieves state-of-the-art results in self-supervision using motion cues, competitive results for self-supervision ... We use motion cues in the form of optical flow, to supervise representations of static images. ... with self-supervision methods that use other cues, making motion a sensible choice for self-supervision by itself or in combination with other cues [1] . ...

arXiv:1807.05636v1 fatcat:pz2fykwokzandbcgin5vczolc4

To address this issue we propose a way to learn primary capsule encoders that detect atomic parts from a single image. ... During training we exploit motion as a powerful perceptual cue for part definition, with an expressive decoder for part generation within a layered image model with occlusion. ... Training is done in a self-supervised manner from consecutive frames in video. ...

arXiv:2011.13920v2 fatcat:lqdrlo3iozfqli6ddj2hztutf4

Open Access Multiple Versions

To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image ... We further provide a theoretical bound of self-supervised learning via an uncertainty analysis that characterizes the performance of the self-supervised learning without training. ... Equation (3) and (4) allows us to utilize a large amount of real videos without the 3D ground truth via self-supervision, i.e., the estimated depth in one pose can be used to supervise the depth in the ...

arXiv:2103.03319v3 fatcat:hwxapqd5tfcl3lslyqakl6tp64

Multiple Versions

Our Depth Field Networks (DeFiNe) achieve state-of-the-art results in stereo and video depth estimation without explicit geometric constraints, and improve on zero-shot domain generalization by a wide ... We also show that introducing view synthesis as an auxiliary task further improves depth estimation. ... Self-supervised methods provide an alternative to those that rely on groundtruth depth maps at training time, and are able to take advantage of the new availability of large-scale video datasets. ...

arXiv:2207.14287v1 fatcat:o77recscebhadf4txcqhxmkese

We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube. Our key contribution is twofold. ... Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task. ... As depth provides a general cue regarding scene geometry, we find it to benefit pose estimation performance on KITTI (translation error with the student model drops from 17.04 to 13.09), but to a lesser ...

arXiv:2309.16772v3 fatcat:ew6nckp435ajjdkiranwvklgem

Multiple Versions

In NDR, we adopt the neural implicit function for surface representation and rendering such that the captured color and depth can be fully utilized to jointly optimize the surface and deformations. ... We propose Neural-DynamicReconstruction (NDR), a template-free method to recover high-fidelity geometry and motions of a dynamic scene from a monocular RGB-D camera. ... Depth cues. We evaluate the reconstruction results with only RGB supervision, i.e. removing depth images and only supervised with loss terms L mask , L color , L reg . ...

arXiv:2206.15258v2 fatcat:7wsqdqyr25hhnfsfcey3bcfhye

Multiple Versions

The refined cost volume is decoded into depth estimates, and the whole pipeline is trained end-to-end from videos using only a photometric objective. ... In this paper we revisit feature matching for self-supervised monocular depth estimation, and propose a novel transformer architecture for cost volume generation. ... Related Work Self-Supervised Depth Estimation The work of Godard et al. ...

arXiv:2204.07616v2 fatcat:bfmkbp2es5fx3mmjlkf2ske3ga

Multiple Versions

A widely-adopted solution to tackle this bottleneck is to extract the pose and shape information from 2D image to 3D correspondence. ... Recent advances in 3D vision have led to a number of impressive works on the 3D shape and pose estimation, each with different pros and cons. ... The ACMR also can exploit online adaptation to generalize the learned model to input videos and self-supervised setting. ...

arXiv:2207.04512v2 fatcat:4amonw6u7ree7adqj4iexsvxhm

Multiple Versions

Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues [article]

Preserved Fulltext

Other Versions

Self-Supervised 3D Human Pose Estimation in Static Video Via Neural Rendering [article]

Preserved Fulltext

Depth Is All You Need for Monocular 3D Detection [article]

Preserved Fulltext

Author Index

Preserved Fulltext

Depth Learning Methods For Bridges Inspection Using UAV [chapter]

Preserved Fulltext

Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey [article]

Preserved Fulltext

Review of Visual Saliency Detection with Comprehensive Information [article]

Preserved Fulltext

Other Versions

Cross Pixel Optical Flow Similarity for Self-Supervised Learning [article]

Preserved Fulltext

Unsupervised part representation by Flow Capsules [article]

Preserved Fulltext

Self-supervised 3D Representation Learning of Dressed Humans from Social Media Videos [article]

Preserved Fulltext

Other Versions

Depth Field Networks for Generalizable Multi-view Scene Representation [article]

Preserved Fulltext

XVO: Generalized Visual Odometry via Cross-Modal Self-Training [article]

Preserved Fulltext

Other Versions

Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera [article]

Preserved Fulltext

Other Versions

Multi-Frame Self-Supervised Depth with Transformers [article]

Preserved Fulltext

Other Versions

Learning-based Monocular 3D Reconstruction of Birds: A Contemporary Survey [article]

Preserved Fulltext

Other Versions