Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning.

By using active learning as our optimization strategy for labeling tasks in crowd-sourced databases, we can minimize the number of questions asked to the crowd, allowing crowd-sourced applications to scale ... Designing active learning algorithms for a crowd-sourced database poses many practical challenges: such algorithms need to be generic, scalable, and easy to use, even for practitioners who are not machine ... To scale up to large datasets, we use machine learning to avoid obtaining crowd labels for a signi cant portion of the data. Active Learning. AL has a rich literature in machine learning (see [ ]). ...

doi:10.14778/2735471.2735474 fatcat:pttr3kfjgbdzlmqw7xefktn6ou

In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length HD movies. ... We characterize the dataset by benchmarking different approaches for generating video descriptions. ... Marcus Rohrbach was supported by a fellowship within the FITweltweit-Program of the German Academic Exchange Service (DAAD). ...

doi:10.1109/cvpr.2015.7298940 dblp:conf/cvpr/RohrbachRTS15 fatcat:fslwtmy4vnhlbh5ch6vli3s6qi

Therefore it is the biggest open bi-modal data collection for SER task nowadays. It is annotated using a crowd-sourcing platform and includes two subsets: acted and real-life. ... We present a new data set for speech emotion recognition (SER) tasks called Dusha. ... We consider the problem of large-scale data sets for SER tasks. ...

arXiv:2212.12266v1 fatcat:7sq5qgv7xfftzhwy22wuy3wi2y

We then use Sentinel-1 and -2 satellite images as our main data source. The benefits of these data are their large cross-sectional and longitudinal scope plus their unrestricted accessibility. ... We provide a detailed description of the algorithms used to generate the data and the results. ... Special thanks go to our friends David Dao at GainForest/DS3Lab and Dr. Josh Veitch Michaelis at Restor/DS3Lab, for their inspirational discussions and brainstorming sessions. ...

arXiv:2303.02230v1 fatcat:s4olx2xubbbipjagobqd3rq5om

Open Access

To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. ... Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language ... ACKNOWLEDGMENTS We would like to thank the numerous volunteers that helped make this dataset possible by providing the annotations in natural language. ...

doi:10.1089/big.2016.0028 pmid:27992262 fatcat:qqob2fenyrco3hxk5cswvu3sua

Multiple Versions

To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of ... Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. ... Laurent Charlin is supported through a CIFAR AI Chair and grants from NSERC, CIFAR, IVADO, Samsung, and Google. ...

arXiv:2009.06415v2 fatcat:tbqqsdrfwfcv5nnhl3h4re6cxa

Multiple Versions

However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. ... In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. ... Since crowd-sourcing annotations for such long videos is very challenging, we had our original participants do a coarse first annotation. ...

doi:10.1007/978-3-030-01225-0_44 fatcat:f24v7vnuhzhjbm3aof3j4ycna4

This paper explores and attempts to quantify the uncertainties and biases due to annotator demographics when creating sentiment analysis datasets. ... As machine learning methods become more powerful and capture more nuances of human behavior, biases in the dataset can shape what the model learns and is evaluated on. ... ACKNOWLEDGEMENTS We would like to thank Lina Kim for her support via the Research Mentorship Program at UC Santa Barbara. ...

doi:10.1145/3555632 fatcat:7gjgh4p52zg7pnfxyxiim2upa4

Our preliminary results show that the proposed dataset can be a valuable data source and benchmark data set for future applications. ... Moreover, proposed architectures and results of this study could be used for transfer learning of different datasets and models for airplane detection. ... We are also grateful to Google Earth for providing high resolution satellite imagery. ...

arXiv:2204.10959v1 fatcat:plqvug36nzbvxm7dpsffzgdb44

Open Access

We leverage a largely ignored source of information: the behavior of the model on individual instances during training (training dynamics) for building data maps. ... We introduce Data Maps---a model-based tool to characterize and diagnose datasets. ... We thank the anonymous reviewers, and our colleagues from AI2 and UWNLP, especially Ana Marasović, and Suchin Gururangan, for their helpful feedback. ...

arXiv:2009.10795v2 fatcat:xmdge47t6nfwjo3aqjbeqjoet4

Multiple Versions

A B S T R A C T Experts and crowds can work together to generate high-quality datasets, but such collaboration is limited to a large-scale pool of data. ... In other words, training on a large-scale dataset depends more on crowdsourced datasets with aggregated labels than expert intensively checked labels. ... Recently, mostly large-scale datasets are assessed on crowd-sourcing platforms where a labeling task can be set up with each instance being assessed by, e.g., three assessors. ...

doi:10.1016/j.comnet.2021.108227 fatcat:sbc2b26fergenn7wvi6duwk3iy

Open Access

To be specific, we design an open-source egocentric data collection sensor suite wearable by walking humans to provide multi-modal robot perception data; we collect a large-scale (~100 km, 20 hours, 300 ... In this work, we propose to utilize the body of rich, widely available, social human navigation data in many natural human-inhabited public spaces for robots to learn similar, human-like, socially compliant ... of people 24 Queue Stairs Walking up and/or down stairs 17 Vehicle Interaction Navigating around a vehicle 13 Navigating Through Large Crowds Navigating among large unstructured crowds 19 Elevator Ride ...

arXiv:2303.14880v2 fatcat:qy2wskgulbbwtezqunu4d327eq

Open Access Multiple Versions

We provide a large-scale HD dataset named WILDTRACK which finally makes advanced deep learning methods applicable to this problem. ... As multi-camera set-ups become more frequently encountered, joint exploitation of the across views information would allow for improved detection performances. ... We would also like to thank Florent Monay and Salim Kayal for their advices and help regarding the calibration of the cameras. ...

arXiv:1707.09299v1 fatcat:ktmq5gz4azbnll4g2vkrcf4oau

Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of ... purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication ... A large scale synthetic image dataset of images of street scenes with dense semantic segmentation maps, generated by the Unity game engine, is SYNTHIA. ...

doi:10.1007/978-3-030-68799-1_1 fatcat:2an77jnqarealgsgvg5abd4gwe

Considering this, we introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations. ... Specifically, the dataset includes several images with weather-based degradations and illumination variations, making it a very challenging dataset. ... We would like to specially thank Kumar Siddhanth, Poojan Oza, A. N. Sindagi, Jayadev S, Supriya S, Shruthi S and S. Sreevali for providing assistance in annotation and verification efforts. ...

arXiv:2004.03597v2 fatcat:jztuu4m76vdznpjimneqk3v4sm

Open Access Multiple Versions

Scaling up crowd-sourcing to very large datasets

Preserved Fulltext

A dataset for Movie Description

Preserved Fulltext

Large Raw Emotional Dataset with Aggregation Mechanism [article]

Preserved Fulltext

Building Floorspace in China: A Dataset and Learning Pipeline [article]

Preserved Fulltext

The KIT Motion-Language Dataset

Preserved Fulltext

Other Versions

Synbols: Probing Learning Algorithms with Synthetic Datasets [article]

Preserved Fulltext

Other Versions

Scaling Egocentric Vision: The Dataset [chapter]

Preserved Fulltext

Impact of Annotator Demographics on Sentiment Dataset Labeling

Preserved Fulltext

HRPlanes: High Resolution Airplane Dataset for Deep Learning [article]

Preserved Fulltext

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics [article]

Preserved Fulltext

Other Versions

Deep neural learning on weighted datasets utilizing label disagreement from crowdsourcing

Preserved Fulltext

Toward Human-Like Social Robot Navigation: A Large-Scale, Multi-Modal, Social Human Navigation Dataset [article]

Preserved Fulltext

The WILDTRACK Multi-Camera Person Dataset [article]

Preserved Fulltext

Densely Annotated Photorealistic Virtual Dataset Generation for Abnormal Event Detection [chapter]

Preserved Fulltext

JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [article]

Preserved Fulltext

Other Versions