A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Scaling up crowd-sourcing to very large datasets
2014
Proceedings of the VLDB Endowment
By using active learning as our optimization strategy for labeling tasks in crowd-sourced databases, we can minimize the number of questions asked to the crowd, allowing crowd-sourced applications to scale ...
Designing active learning algorithms for a crowd-sourced database poses many practical challenges: such algorithms need to be generic, scalable, and easy to use, even for practitioners who are not machine ...
To scale up to large datasets, we use machine learning to avoid obtaining crowd labels for a signi cant portion of the data. Active Learning. AL has a rich literature in machine learning (see [ ]). ...
doi:10.14778/2735471.2735474
fatcat:pttr3kfjgbdzlmqw7xefktn6ou
A dataset for Movie Description
2015
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length HD movies. ...
We characterize the dataset by benchmarking different approaches for generating video descriptions. ...
Marcus Rohrbach was supported by a fellowship within the FITweltweit-Program of the German Academic Exchange Service (DAAD). ...
doi:10.1109/cvpr.2015.7298940
dblp:conf/cvpr/RohrbachRTS15
fatcat:fslwtmy4vnhlbh5ch6vli3s6qi
Large Raw Emotional Dataset with Aggregation Mechanism
[article]
2022
arXiv
pre-print
Therefore it is the biggest open bi-modal data collection for SER task nowadays. It is annotated using a crowd-sourcing platform and includes two subsets: acted and real-life. ...
We present a new data set for speech emotion recognition (SER) tasks called Dusha. ...
We consider the problem of large-scale data sets for SER tasks. ...
arXiv:2212.12266v1
fatcat:7sq5qgv7xfftzhwy22wuy3wi2y
Building Floorspace in China: A Dataset and Learning Pipeline
[article]
2023
arXiv
pre-print
We then use Sentinel-1 and -2 satellite images as our main data source. The benefits of these data are their large cross-sectional and longitudinal scope plus their unrestricted accessibility. ...
We provide a detailed description of the algorithms used to generate the data and the results. ...
Special thanks go to our friends David Dao at GainForest/DS3Lab and Dr. Josh Veitch Michaelis at Restor/DS3Lab, for their inspirational discussions and brainstorming sessions. ...
arXiv:2303.02230v1
fatcat:s4olx2xubbbipjagobqd3rq5om
The KIT Motion-Language Dataset
2016
Big Data
To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. ...
Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language ...
ACKNOWLEDGMENTS We would like to thank the numerous volunteers that helped make this dataset possible by providing the annotations in natural language. ...
doi:10.1089/big.2016.0028
pmid:27992262
fatcat:qqob2fenyrco3hxk5cswvu3sua
Synbols: Probing Learning Algorithms with Synthetic Datasets
[article]
2020
arXiv
pre-print
To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of ...
Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. ...
Laurent Charlin is supported through a CIFAR AI Chair and grants from NSERC, CIFAR, IVADO, Samsung, and Google. ...
arXiv:2009.06415v2
fatcat:tbqqsdrfwfcv5nnhl3h4re6cxa
Scaling Egocentric Vision: The Dataset
[chapter]
2018
Lecture Notes in Computer Science
However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. ...
In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. ...
Since crowd-sourcing annotations for such long videos is very challenging, we had our original participants do a coarse first annotation. ...
doi:10.1007/978-3-030-01225-0_44
fatcat:f24v7vnuhzhjbm3aof3j4ycna4
Impact of Annotator Demographics on Sentiment Dataset Labeling
2022
Proceedings of the ACM on Human-Computer Interaction
This paper explores and attempts to quantify the uncertainties and biases due to annotator demographics when creating sentiment analysis datasets. ...
As machine learning methods become more powerful and capture more nuances of human behavior, biases in the dataset can shape what the model learns and is evaluated on. ...
ACKNOWLEDGEMENTS We would like to thank Lina Kim for her support via the Research Mentorship Program at UC Santa Barbara. ...
doi:10.1145/3555632
fatcat:7gjgh4p52zg7pnfxyxiim2upa4
HRPlanes: High Resolution Airplane Dataset for Deep Learning
[article]
2022
arXiv
pre-print
Our preliminary results show that the proposed dataset can be a valuable data source and benchmark data set for future applications. ...
Moreover, proposed architectures and results of this study could be used for transfer learning of different datasets and models for airplane detection. ...
We are also grateful to Google Earth for providing high resolution satellite imagery. ...
arXiv:2204.10959v1
fatcat:plqvug36nzbvxm7dpsffzgdb44
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
[article]
2020
arXiv
pre-print
We leverage a largely ignored source of information: the behavior of the model on individual instances during training (training dynamics) for building data maps. ...
We introduce Data Maps---a model-based tool to characterize and diagnose datasets. ...
We thank the anonymous reviewers, and our colleagues from AI2 and UWNLP, especially Ana Marasović, and Suchin Gururangan, for their helpful feedback. ...
arXiv:2009.10795v2
fatcat:xmdge47t6nfwjo3aqjbeqjoet4
Deep neural learning on weighted datasets utilizing label disagreement from crowdsourcing
2021
Computer Networks
A B S T R A C T Experts and crowds can work together to generate high-quality datasets, but such collaboration is limited to a large-scale pool of data. ...
In other words, training on a large-scale dataset depends more on crowdsourced datasets with aggregated labels than expert intensively checked labels. ...
Recently, mostly large-scale datasets are assessed on crowd-sourcing platforms where a labeling task can be set up with each instance being assessed by, e.g., three assessors. ...
doi:10.1016/j.comnet.2021.108227
fatcat:sbc2b26fergenn7wvi6duwk3iy
Toward Human-Like Social Robot Navigation: A Large-Scale, Multi-Modal, Social Human Navigation Dataset
[article]
2023
arXiv
pre-print
To be specific, we design an open-source egocentric data collection sensor suite wearable by walking humans to provide multi-modal robot perception data; we collect a large-scale (~100 km, 20 hours, 300 ...
In this work, we propose to utilize the body of rich, widely available, social human navigation data in many natural human-inhabited public spaces for robots to learn similar, human-like, socially compliant ...
of people 24 Queue Stairs Walking up and/or down stairs 17 Vehicle Interaction Navigating around a vehicle 13 Navigating Through Large Crowds Navigating among large unstructured crowds 19 Elevator Ride ...
arXiv:2303.14880v2
fatcat:qy2wskgulbbwtezqunu4d327eq
The WILDTRACK Multi-Camera Person Dataset
[article]
2017
arXiv
pre-print
We provide a large-scale HD dataset named WILDTRACK which finally makes advanced deep learning methods applicable to this problem. ...
As multi-camera set-ups become more frequently encountered, joint exploitation of the across views information would allow for improved detection performances. ...
We would also like to thank Florent Monay and Salim Kayal for their advices and help regarding the calibration of the cameras. ...
arXiv:1707.09299v1
fatcat:ktmq5gz4azbnll4g2vkrcf4oau
Densely Annotated Photorealistic Virtual Dataset Generation for Abnormal Event Detection
[chapter]
2021
Lecture Notes in Computer Science
Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of ...
purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication ...
A large scale synthetic image dataset of images of street scenes with dense semantic segmentation maps, generated by the Unity game engine, is SYNTHIA. ...
doi:10.1007/978-3-030-68799-1_1
fatcat:2an77jnqarealgsgvg5abd4gwe
JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method
[article]
2020
arXiv
pre-print
Considering this, we introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations. ...
Specifically, the dataset includes several images with weather-based degradations and illumination variations, making it a very challenging dataset. ...
We would like to specially thank Kumar Siddhanth, Poojan Oza, A. N. Sindagi, Jayadev S, Supriya S, Shruthi S and S. Sreevali for providing assistance in annotation and verification efforts. ...
arXiv:2004.03597v2
fatcat:jztuu4m76vdznpjimneqk3v4sm
« Previous
Showing results 1 — 15 out of 25,341 results