Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








4,012 Hits in 5.3 sec

Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues [article]

Jianrong Wang and Ge Zhang and Zhenyu Wu and XueWei Li and Li Liu
2020 arXiv   pre-print
In this work, we propose a novel self-supervised joint learning framework for depth estimation using consecutive frames from monocular and stereo videos.  ...  Existing self-supervised methods usually utilize a single view to train the depth estimation network.  ...  The proposed framework utilizes implicit cues extractor to extract static and dynamic depth cues from unit stream in shallow space, and uses implicit cues to guide the depth estimation of a single image  ... 
arXiv:2006.09876v3 fatcat:g4z3yoabbrbsvg3dsbdiesi5we

Self-Supervised 3D Human Pose Estimation in Static Video Via Neural Rendering [article]

Luca Schmidtke, Benjamin Hou, Athanasios Vlontzos, Bernhard Kainz
2022 arXiv   pre-print
We achieve this by formulating a simple yet effective self-supervision task: our model is required to reconstruct a random frame of a video given a frame from another timepoint and a rendered image of  ...  We present preliminary results for a method to estimate 3D pose from 2D video containing a single person and a static background without the need for any manual landmark annotations.  ...  In this paper we focus on self-supervised 3D pose estimation from monocular video, a key element of a wide range of applications including motion capture, visual surveillance or gait analysis.  ... 
arXiv:2210.04514v1 fatcat:fpeaxmw4obghlm3dfbtciecpgu

Depth Is All You Need for Monocular 3D Detection [article]

Dennis Park, Jie Li, Dian Chen, Vitor Guizilini, Adrien Gaidon
2022 arXiv   pre-print
A key contributor to recent progress in 3D detection from single images is monocular depth estimation.  ...  Existing methods focus on how to leverage depth explicitly, by generating pseudo-pointclouds or providing attention cues for image features.  ...  ignore close" indicate a small trick to ignore closest depth estimation in self-supervised training.All methods start from a single initial model pretrained by large-scale depth supervision available from  ... 
arXiv:2210.02493v1 fatcat:7c3gw6nakfbyzl3wve2kwab3ee

Author Index

2010 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  
Using Implicit Cues from Image Tags I Ichimura, Naoyuki Workshop: GPU Computing with Orientation Maps for Extracting Local Invariant Features Igual, Laura Workshop: Aligning Endoluminal Scene  ...  Cues from Image Tags Far-Sighted Active Learning on a Budget for Image and Video Recognition Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images Learning a Hierarchy of  ... 
doi:10.1109/cvpr.2010.5539913 fatcat:y6m5knstrzfyfin6jzusc42p54

Depth Learning Methods For Bridges Inspection Using UAV [chapter]

Hicham Sekkati, Jean-Francois Lapointe
2023 Drones - Various Applications  
This paper is investigating learning methods using depth as a cue measurement that can be used for bridge inspection.  ...  We go over the state-of-the-art deep learning methods, including supervised and unsupervised methods.  ...  Acknowledgements This project was supported in part by collaborative research funding from the National Research Council of Canada's Artificial Intelligence for Logistics Program.  ... 
doi:10.5772/intechopen.1002466 fatcat:n64au53jjzbnzmvhyp3ak5rktm

Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey [article]

Yang Liu, Changzhen Qiu, Zhiyong Zhang
2024 arXiv   pre-print
well as human mesh recovery, encompassing methods based on explicit models and implicit representations.  ...  3D human pose estimation and mesh recovery have attracted widespread research interest in many areas, such as computer vision, autonomous driving, and robotics.  ...  and self-supervised 3D human pose and shape estimation.  ... 
arXiv:2402.18844v1 fatcat:hqfywkjouzbe3ifj36sdiidi2e

Review of Visual Saliency Detection with Comprehensive Information [article]

Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang
2018 arXiv   pre-print
RGBD saliency detection model focuses on extracting the salient regions from RGBD images by combining the depth information.  ...  The goal of video saliency detection model is to locate the motion-related salient object in video sequences, which considers the motion cue and spatiotemporal constraint jointly.  ...  Depth Measure Based RGBD Saliency Detection In order to capture the comprehensive and implicit attributes from the depth map and enhance the identification of salient object, some depth measures, such  ... 
arXiv:1803.03391v2 fatcat:htcmhlo32jhczehvvq6nmgzwam

Cross Pixel Optical Flow Similarity for Self-Supervised Learning [article]

Aravindh Mahendran, James Thewlis, Andrea Vedaldi
2018 arXiv   pre-print
Our method, which significantly simplifies previous attempts at using motion for self-supervision, achieves state-of-the-art results in self-supervision using motion cues, competitive results for self-supervision  ...  We use motion cues in the form of optical flow, to supervise representations of static images.  ...  with self-supervision methods that use other cues, making motion a sensible choice for self-supervision by itself or in combination with other cues [1] .  ... 
arXiv:1807.05636v1 fatcat:pz2fykwokzandbcgin5vczolc4

Unsupervised part representation by Flow Capsules [article]

Sara Sabour, Andrea Tagliasacchi, Soroosh Yazdani, Geoffrey E. Hinton, David J. Fleet
2021 arXiv   pre-print
To address this issue we propose a way to learn primary capsule encoders that detect atomic parts from a single image.  ...  During training we exploit motion as a powerful perceptual cue for part definition, with an expressive decoder for part generation within a layered image model with occlusion.  ...  Training is done in a self-supervised manner from consecutive frames in video.  ... 
arXiv:2011.13920v2 fatcat:lqdrlo3iozfqli6ddj2hztutf4

Self-supervised 3D Representation Learning of Dressed Humans from Social Media Videos [article]

Yasamin Jafarian, Hyun Soo Park
2022 arXiv   pre-print
To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image  ...  We further provide a theoretical bound of self-supervised learning via an uncertainty analysis that characterizes the performance of the self-supervised learning without training.  ...  Equation (3) and (4) allows us to utilize a large amount of real videos without the 3D ground truth via self-supervision, i.e., the estimated depth in one pose can be used to supervise the depth in the  ... 
arXiv:2103.03319v3 fatcat:hwxapqd5tfcl3lslyqakl6tp64

Depth Field Networks for Generalizable Multi-view Scene Representation [article]

Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Greg Shakhnarovich, Matthew Walter, Adrien Gaidon
2022 arXiv   pre-print
Our Depth Field Networks (DeFiNe) achieve state-of-the-art results in stereo and video depth estimation without explicit geometric constraints, and improve on zero-shot domain generalization by a wide  ...  We also show that introducing view synthesis as an auxiliary task further improves depth estimation.  ...  Self-supervised methods provide an alternative to those that rely on groundtruth depth maps at training time, and are able to take advantage of the new availability of large-scale video datasets.  ... 
arXiv:2207.14287v1 fatcat:o77recscebhadf4txcqhxmkese

XVO: Generalized Visual Odometry via Cross-Modal Self-Training [article]

Lei Lai and Zhongkai Shangguan and Jimuyang Zhang and Eshed Ohn-Bar
2023 arXiv   pre-print
We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube. Our key contribution is twofold.  ...  Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task.  ...  As depth provides a general cue regarding scene geometry, we find it to benefit pose estimation performance on KITTI (translation error with the student model drops from 17.04 to 13.09), but to a lesser  ... 
arXiv:2309.16772v3 fatcat:ew6nckp435ajjdkiranwvklgem

Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera [article]

Hongrui Cai, Wanquan Feng, Xuetao Feng, Yan Wang, Juyong Zhang
2022 arXiv   pre-print
In NDR, we adopt the neural implicit function for surface representation and rendering such that the captured color and depth can be fully utilized to jointly optimize the surface and deformations.  ...  We propose Neural-DynamicReconstruction (NDR), a template-free method to recover high-fidelity geometry and motions of a dynamic scene from a monocular RGB-D camera.  ...  Depth cues. We evaluate the reconstruction results with only RGB supervision, i.e. removing depth images and only supervised with loss terms L mask , L color , L reg .  ... 
arXiv:2206.15258v2 fatcat:7wsqdqyr25hhnfsfcey3bcfhye

Multi-Frame Self-Supervised Depth with Transformers [article]

Vitor Guizilini, Rares Ambrus, Dian Chen, Sergey Zakharov, Adrien Gaidon
2022 arXiv   pre-print
The refined cost volume is decoded into depth estimates, and the whole pipeline is trained end-to-end from videos using only a photometric objective.  ...  In this paper we revisit feature matching for self-supervised monocular depth estimation, and propose a novel transformer architecture for cost volume generation.  ...  Related Work Self-Supervised Depth Estimation The work of Godard et al.  ... 
arXiv:2204.07616v2 fatcat:bfmkbp2es5fx3mmjlkf2ske3ga

Learning-based Monocular 3D Reconstruction of Birds: A Contemporary Survey [article]

Seyed Mojtaba Marvasti-Zadeh, Mohammad N.S. Jahromi, Javad Khaghani, Devin Goodsman, Nilanjan Ray, Nadir Erbilgin
2022 arXiv   pre-print
A widely-adopted solution to tackle this bottleneck is to extract the pose and shape information from 2D image to 3D correspondence.  ...  Recent advances in 3D vision have led to a number of impressive works on the 3D shape and pose estimation, each with different pros and cons.  ...  The ACMR also can exploit online adaptation to generalize the learned model to input videos and self-supervised setting.  ... 
arXiv:2207.04512v2 fatcat:4amonw6u7ree7adqj4iexsvxhm
« Previous Showing results 1 — 15 out of 4,012 results