A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues
[article]
2020
arXiv
pre-print
In this work, we propose a novel self-supervised joint learning framework for depth estimation using consecutive frames from monocular and stereo videos. ...
Existing self-supervised methods usually utilize a single view to train the depth estimation network. ...
The proposed framework utilizes implicit cues extractor to extract static and dynamic depth cues from unit stream in shallow space, and uses implicit cues to guide the depth estimation of a single image ...
arXiv:2006.09876v3
fatcat:g4z3yoabbrbsvg3dsbdiesi5we
Self-Supervised 3D Human Pose Estimation in Static Video Via Neural Rendering
[article]
2022
arXiv
pre-print
We achieve this by formulating a simple yet effective self-supervision task: our model is required to reconstruct a random frame of a video given a frame from another timepoint and a rendered image of ...
We present preliminary results for a method to estimate 3D pose from 2D video containing a single person and a static background without the need for any manual landmark annotations. ...
In this paper we focus on self-supervised 3D pose estimation from monocular video, a key element of a wide range of applications including motion capture, visual surveillance or gait analysis. ...
arXiv:2210.04514v1
fatcat:fpeaxmw4obghlm3dfbtciecpgu
Depth Is All You Need for Monocular 3D Detection
[article]
2022
arXiv
pre-print
A key contributor to recent progress in 3D detection from single images is monocular depth estimation. ...
Existing methods focus on how to leverage depth explicitly, by generating pseudo-pointclouds or providing attention cues for image features. ...
ignore close" indicate a small trick to ignore closest depth estimation in self-supervised training.All methods start from a single initial model pretrained by large-scale depth supervision available from ...
arXiv:2210.02493v1
fatcat:7c3gw6nakfbyzl3wve2kwab3ee
Author Index
2010
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Using Implicit Cues from Image Tags
I
Ichimura, Naoyuki
Workshop: GPU Computing with Orientation Maps for Extracting Local Invariant Features
Igual, Laura
Workshop: Aligning Endoluminal Scene ...
Cues from Image Tags
Far-Sighted Active Learning on a Budget for Image and Video Recognition
Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images
Learning a Hierarchy of ...
doi:10.1109/cvpr.2010.5539913
fatcat:y6m5knstrzfyfin6jzusc42p54
Depth Learning Methods For Bridges Inspection Using UAV
[chapter]
2023
Drones - Various Applications
This paper is investigating learning methods using depth as a cue measurement that can be used for bridge inspection. ...
We go over the state-of-the-art deep learning methods, including supervised and unsupervised methods. ...
Acknowledgements This project was supported in part by collaborative research funding from the National Research Council of Canada's Artificial Intelligence for Logistics Program. ...
doi:10.5772/intechopen.1002466
fatcat:n64au53jjzbnzmvhyp3ak5rktm
Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey
[article]
2024
arXiv
pre-print
well as human mesh recovery, encompassing methods based on explicit models and implicit representations. ...
3D human pose estimation and mesh recovery have attracted widespread research interest in many areas, such as computer vision, autonomous driving, and robotics. ...
and self-supervised 3D human pose and shape estimation. ...
arXiv:2402.18844v1
fatcat:hqfywkjouzbe3ifj36sdiidi2e
Review of Visual Saliency Detection with Comprehensive Information
[article]
2018
arXiv
pre-print
RGBD saliency detection model focuses on extracting the salient regions from RGBD images by combining the depth information. ...
The goal of video saliency detection model is to locate the motion-related salient object in video sequences, which considers the motion cue and spatiotemporal constraint jointly. ...
Depth Measure Based RGBD Saliency Detection In order to capture the comprehensive and implicit attributes from the depth map and enhance the identification of salient object, some depth measures, such ...
arXiv:1803.03391v2
fatcat:htcmhlo32jhczehvvq6nmgzwam
Cross Pixel Optical Flow Similarity for Self-Supervised Learning
[article]
2018
arXiv
pre-print
Our method, which significantly simplifies previous attempts at using motion for self-supervision, achieves state-of-the-art results in self-supervision using motion cues, competitive results for self-supervision ...
We use motion cues in the form of optical flow, to supervise representations of static images. ...
with self-supervision methods that use other cues, making motion a sensible choice for self-supervision by itself or in combination with other cues [1] . ...
arXiv:1807.05636v1
fatcat:pz2fykwokzandbcgin5vczolc4
Unsupervised part representation by Flow Capsules
[article]
2021
arXiv
pre-print
To address this issue we propose a way to learn primary capsule encoders that detect atomic parts from a single image. ...
During training we exploit motion as a powerful perceptual cue for part definition, with an expressive decoder for part generation within a layered image model with occlusion. ...
Training is done in a self-supervised manner from consecutive frames in video. ...
arXiv:2011.13920v2
fatcat:lqdrlo3iozfqli6ddj2hztutf4
Self-supervised 3D Representation Learning of Dressed Humans from Social Media Videos
[article]
2022
arXiv
pre-print
To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image ...
We further provide a theoretical bound of self-supervised learning via an uncertainty analysis that characterizes the performance of the self-supervised learning without training. ...
Equation (3) and (4) allows us to utilize a large amount of real videos without the 3D ground truth via self-supervision, i.e., the estimated depth in one pose can be used to supervise the depth in the ...
arXiv:2103.03319v3
fatcat:hwxapqd5tfcl3lslyqakl6tp64
Depth Field Networks for Generalizable Multi-view Scene Representation
[article]
2022
arXiv
pre-print
Our Depth Field Networks (DeFiNe) achieve state-of-the-art results in stereo and video depth estimation without explicit geometric constraints, and improve on zero-shot domain generalization by a wide ...
We also show that introducing view synthesis as an auxiliary task further improves depth estimation. ...
Self-supervised methods provide an alternative to those that rely on groundtruth depth maps at training time, and are able to take advantage of the new availability of large-scale video datasets. ...
arXiv:2207.14287v1
fatcat:o77recscebhadf4txcqhxmkese
XVO: Generalized Visual Odometry via Cross-Modal Self-Training
[article]
2023
arXiv
pre-print
We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube. Our key contribution is twofold. ...
Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task. ...
As depth provides a general cue regarding scene geometry, we find it to benefit pose estimation performance on KITTI (translation error with the student model drops from 17.04 to 13.09), but to a lesser ...
arXiv:2309.16772v3
fatcat:ew6nckp435ajjdkiranwvklgem
Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera
[article]
2022
arXiv
pre-print
In NDR, we adopt the neural implicit function for surface representation and rendering such that the captured color and depth can be fully utilized to jointly optimize the surface and deformations. ...
We propose Neural-DynamicReconstruction (NDR), a template-free method to recover high-fidelity geometry and motions of a dynamic scene from a monocular RGB-D camera. ...
Depth cues. We evaluate the reconstruction results with only RGB supervision, i.e. removing depth images and only supervised with loss terms L mask , L color , L reg . ...
arXiv:2206.15258v2
fatcat:7wsqdqyr25hhnfsfcey3bcfhye
Multi-Frame Self-Supervised Depth with Transformers
[article]
2022
arXiv
pre-print
The refined cost volume is decoded into depth estimates, and the whole pipeline is trained end-to-end from videos using only a photometric objective. ...
In this paper we revisit feature matching for self-supervised monocular depth estimation, and propose a novel transformer architecture for cost volume generation. ...
Related Work
Self-Supervised Depth Estimation The work of Godard et al. ...
arXiv:2204.07616v2
fatcat:bfmkbp2es5fx3mmjlkf2ske3ga
Learning-based Monocular 3D Reconstruction of Birds: A Contemporary Survey
[article]
2022
arXiv
pre-print
A widely-adopted solution to tackle this bottleneck is to extract the pose and shape information from 2D image to 3D correspondence. ...
Recent advances in 3D vision have led to a number of impressive works on the 3D shape and pose estimation, each with different pros and cons. ...
The ACMR also can exploit online adaptation to generalize the learned model to input videos and self-supervised setting. ...
arXiv:2207.04512v2
fatcat:4amonw6u7ree7adqj4iexsvxhm
« Previous
Showing results 1 — 15 out of 4,012 results