Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

25,341 Hits in 8.8 sec

Scaling up crowd-sourcing to very large datasets

Barzan Mozafari, Purna Sarkar, Michael Franklin, Michael Jordan, Samuel Madden
2014 Proceedings of the VLDB Endowment  
By using active learning as our optimization strategy for labeling tasks in crowd-sourced databases, we can minimize the number of questions asked to the crowd, allowing crowd-sourced applications to scale  ...  Designing active learning algorithms for a crowd-sourced database poses many practical challenges: such algorithms need to be generic, scalable, and easy to use, even for practitioners who are not machine  ...  To scale up to large datasets, we use machine learning to avoid obtaining crowd labels for a signi cant portion of the data. Active Learning. AL has a rich literature in machine learning (see [ ]).  ... 
doi:10.14778/2735471.2735474 fatcat:pttr3kfjgbdzlmqw7xefktn6ou

A dataset for Movie Description

Anna Rohrbach, Marcus Rohrbach, Niket Tandon, Bernt Schiele
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length HD movies.  ...  We characterize the dataset by benchmarking different approaches for generating video descriptions.  ...  Marcus Rohrbach was supported by a fellowship within the FITweltweit-Program of the German Academic Exchange Service (DAAD).  ... 
doi:10.1109/cvpr.2015.7298940 dblp:conf/cvpr/RohrbachRTS15 fatcat:fslwtmy4vnhlbh5ch6vli3s6qi

Large Raw Emotional Dataset with Aggregation Mechanism [article]

Vladimir Kondratenko, Artem Sokolov, Nikolay Karpov, Oleg Kutuzov, Nikita Savushkin, Fyodor Minkin
2022 arXiv   pre-print
Therefore it is the biggest open bi-modal data collection for SER task nowadays. It is annotated using a crowd-sourcing platform and includes two subsets: acted and real-life.  ...  We present a new data set for speech emotion recognition (SER) tasks called Dusha.  ...  We consider the problem of large-scale data sets for SER tasks.  ... 
arXiv:2212.12266v1 fatcat:7sq5qgv7xfftzhwy22wuy3wi2y

Building Floorspace in China: A Dataset and Learning Pipeline [article]

Peter Egger, Susie Xi Rao, Sebastiano Papini
2023 arXiv   pre-print
We then use Sentinel-1 and -2 satellite images as our main data source. The benefits of these data are their large cross-sectional and longitudinal scope plus their unrestricted accessibility.  ...  We provide a detailed description of the algorithms used to generate the data and the results.  ...  Special thanks go to our friends David Dao at GainForest/DS3Lab and Dr. Josh Veitch Michaelis at Restor/DS3Lab, for their inspirational discussions and brainstorming sessions.  ... 
arXiv:2303.02230v1 fatcat:s4olx2xubbbipjagobqd3rq5om

The KIT Motion-Language Dataset

Matthias Plappert, Christian Mandery, Tamim Asfour
2016 Big Data  
To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool.  ...  Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language  ...  ACKNOWLEDGMENTS We would like to thank the numerous volunteers that helped make this dataset possible by providing the annotations in natural language.  ... 
doi:10.1089/big.2016.0028 pmid:27992262 fatcat:qqob2fenyrco3hxk5cswvu3sua

Synbols: Probing Learning Algorithms with Synthetic Datasets [article]

Alexandre Lacoste, Pau Rodríguez, Frédéric Branchaud-Charron, Parmida Atighehchian, Massimo Caccia, Issam Laradji, Alexandre Drouin, Matt Craddock, Laurent Charlin, David Vázquez
2020 arXiv   pre-print
To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of  ...  Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field.  ...  Laurent Charlin is supported through a CIFAR AI Chair and grants from NSERC, CIFAR, IVADO, Samsung, and Google.  ... 
arXiv:2009.06415v2 fatcat:tbqqsdrfwfcv5nnhl3h4re6cxa

Scaling Egocentric Vision: The Dataset [chapter]

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray
2018 Lecture Notes in Computer Science  
However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets.  ...  In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments.  ...  Since crowd-sourcing annotations for such long videos is very challenging, we had our original participants do a coarse first annotation.  ... 
doi:10.1007/978-3-030-01225-0_44 fatcat:f24v7vnuhzhjbm3aof3j4ycna4

Impact of Annotator Demographics on Sentiment Dataset Labeling

Yi Ding, Jacob You, Tonja-Katrin Machulla, Jennifer Jacobs, Pradeep Sen, Tobias Höllerer
2022 Proceedings of the ACM on Human-Computer Interaction  
This paper explores and attempts to quantify the uncertainties and biases due to annotator demographics when creating sentiment analysis datasets.  ...  As machine learning methods become more powerful and capture more nuances of human behavior, biases in the dataset can shape what the model learns and is evaluated on.  ...  ACKNOWLEDGEMENTS We would like to thank Lina Kim for her support via the Research Mentorship Program at UC Santa Barbara.  ... 
doi:10.1145/3555632 fatcat:7gjgh4p52zg7pnfxyxiim2upa4

HRPlanes: High Resolution Airplane Dataset for Deep Learning [article]

Tolga Bakirman, Elif Sertel
2022 arXiv   pre-print
Our preliminary results show that the proposed dataset can be a valuable data source and benchmark data set for future applications.  ...  Moreover, proposed architectures and results of this study could be used for transfer learning of different datasets and models for airplane detection.  ...  We are also grateful to Google Earth for providing high resolution satellite imagery.  ... 
arXiv:2204.10959v1 fatcat:plqvug36nzbvxm7dpsffzgdb44

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics [article]

Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, Yejin Choi
2020 arXiv   pre-print
We leverage a largely ignored source of information: the behavior of the model on individual instances during training (training dynamics) for building data maps.  ...  We introduce Data Maps---a model-based tool to characterize and diagnose datasets.  ...  We thank the anonymous reviewers, and our colleagues from AI2 and UWNLP, especially Ana Marasović, and Suchin Gururangan, for their helpful feedback.  ... 
arXiv:2009.10795v2 fatcat:xmdge47t6nfwjo3aqjbeqjoet4

Deep neural learning on weighted datasets utilizing label disagreement from crowdsourcing

Dongsheng Wang, Prayag Tiwari, Mohammad Shorfuzzaman, Ingo Schmitt
2021 Computer Networks  
A B S T R A C T Experts and crowds can work together to generate high-quality datasets, but such collaboration is limited to a large-scale pool of data.  ...  In other words, training on a large-scale dataset depends more on crowdsourced datasets with aggregated labels than expert intensively checked labels.  ...  Recently, mostly large-scale datasets are assessed on crowd-sourcing platforms where a labeling task can be set up with each instance being assessed by, e.g., three assessors.  ... 
doi:10.1016/j.comnet.2021.108227 fatcat:sbc2b26fergenn7wvi6duwk3iy

Toward Human-Like Social Robot Navigation: A Large-Scale, Multi-Modal, Social Human Navigation Dataset [article]

Duc M. Nguyen, Mohammad Nazeri, Amirreza Payandeh, Aniket Datar, Xuesu Xiao
2023 arXiv   pre-print
To be specific, we design an open-source egocentric data collection sensor suite wearable by walking humans to provide multi-modal robot perception data; we collect a large-scale (~100 km, 20 hours, 300  ...  In this work, we propose to utilize the body of rich, widely available, social human navigation data in many natural human-inhabited public spaces for robots to learn similar, human-like, socially compliant  ...  of people 24 Queue Stairs Walking up and/or down stairs 17 Vehicle Interaction Navigating around a vehicle 13 Navigating Through Large Crowds Navigating among large unstructured crowds 19 Elevator Ride  ... 
arXiv:2303.14880v2 fatcat:qy2wskgulbbwtezqunu4d327eq

The WILDTRACK Multi-Camera Person Dataset [article]

Tatjana Chavdarova, Pierre Baqué, Stéphane Bouquet, Andrii Maksai, Cijo Jose, Louis Lettry, Pascal Fua, Luc Van Gool, François Fleuret
2017 arXiv   pre-print
We provide a large-scale HD dataset named WILDTRACK which finally makes advanced deep learning methods applicable to this problem.  ...  As multi-camera set-ups become more frequently encountered, joint exploitation of the across views information would allow for improved detection performances.  ...  We would also like to thank Florent Monay and Salim Kayal for their advices and help regarding the calibration of the cameras.  ... 
arXiv:1707.09299v1 fatcat:ktmq5gz4azbnll4g2vkrcf4oau

Densely Annotated Photorealistic Virtual Dataset Generation for Abnormal Event Detection [chapter]

Rico Montulet, Alexia Briassouli
2021 Lecture Notes in Computer Science  
Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of  ...  purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication  ...  A large scale synthetic image dataset of images of street scenes with dense semantic segmentation maps, generated by the Unity game engine, is SYNTHIA.  ... 
doi:10.1007/978-3-030-68799-1_1 fatcat:2an77jnqarealgsgvg5abd4gwe

JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [article]

Vishwanath A. Sindagi, Rajeev Yasarla, Vishal M. Patel
2020 arXiv   pre-print
Considering this, we introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations.  ...  Specifically, the dataset includes several images with weather-based degradations and illumination variations, making it a very challenging dataset.  ...  We would like to specially thank Kumar Siddhanth, Poojan Oza, A. N. Sindagi, Jayadev S, Supriya S, Shruthi S and S. Sreevali for providing assistance in annotation and verification efforts.  ... 
arXiv:2004.03597v2 fatcat:jztuu4m76vdznpjimneqk3v4sm
« Previous Showing results 1 — 15 out of 25,341 results