Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








242 Hits in 4.0 sec

Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits

Paul B. Reverdy, Vaibhav Srivastava, Naomi Ehrich Leonard
2014 Proceedings of the IEEE  
| In this paper, we present a formal model of human decision making in explore-exploit tasks using the context of multiarmed bandit problems, where the decision maker must choose among multiple options  ...  We focus on the case of Gaussian rewards in a setting where the decision maker uses Bayesian inference to estimate the reward values.  ...  Cohen for their input, which helped make possible the strong connection of this work to the psychology literature.  ... 
doi:10.1109/jproc.2014.2307024 fatcat:6xwlrab5ynbu5ag7qnjj544ihq

Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits [article]

Paul Reverdy, Vaibhav Srivastava, Naomi E. Leonard
2019 arXiv   pre-print
We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain  ...  We focus on the case of Gaussian rewards in a setting where the decision-maker uses Bayesian inference to estimate the reward values.  ...  Cohen for their input, which helped make possible the strong connection of this work to the psychology literature.  ... 
arXiv:1307.6134v5 fatcat:gpucau2sxzb4dj2p3bh3jksloi

Addictive Games: Case Study on Multi-Armed Bandit Game

Xiaohan Kang, Hong Ri, Mohd Nor Akmal Khalid, Hiroyuki Iida
2021 Information  
This article mainly focuses on expanding on the idea of the motion in mind model in the scene of Multiarmed Bandit games, quantifying the player's psychological inclination by simulation experimental data  ...  Also, the Multiarmed Bandit game is a typical test for Skinner Box design and is most popular in the gambling house, which is a good example to analyze.  ...  Bandit in this simulation follows Gaussian distribution, where every arm follows the Gaussian distribution.  ... 
doi:10.3390/info12120521 fatcat:hth47aalinhdjoqs5rpf3hf5yy

Algorithmic models of human decision making in Gaussian multi-armed bandit problems

Paul Reverdy, Vaibhav Srivastava, Naomi E. Leonard
2014 2014 European Control Conference (ECC)  
We consider a heuristic Bayesian algorithm as a model of human decision making in multi-armed bandit problems with Gaussian rewards.  ...  The stochastic algorithm encodes many of the observed features of human decision making.  ...  Application to human decision making Human decision making in multi-armed bandit problems is well modeled by a heuristic similar to that of UCL (11) and humans are sensitive to the parameters of the  ... 
doi:10.1109/ecc.2014.6862580 dblp:conf/eucc/ReverdySL14 fatcat:meuumb5h7zeenjo5gmegz5iwzi

On optimal foraging and multi-armed bandits

Vaibhav Srivastava, Paul Reverdy, Naomi E. Leonard
2013 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton)  
We observe that the multi-armed bandit problem with transition costs and the associated block allocation algorithm capture the key features of popular animal foraging models in literature.  ...  We consider two variants of the standard multiarmed bandit problem, namely, the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs.  ...  [6] established the optimality of a Bayesian UCB algorithm for Gaussian rewards and drew several connections between these algorithms and human decision-making.  ... 
doi:10.1109/allerton.2013.6736565 dblp:conf/allerton/SrivastavaRL13 fatcat:k7wo7zzmjrfpfdclg53zzpqadi

Satisficing in multi-armed bandit problems [article]

Paul Reverdy and Vaibhav Srivastava and Naomi Ehrich Leonard
2016 arXiv   pre-print
Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty.  ...  We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold.  ...  Finally, it is well understood that satisficing is an important feature of human decision making [29] and that the UCL algorithm can model many features of human decision decision making in bandit tasks  ... 
arXiv:1512.07638v2 fatcat:ourj3y6dpjgm7lepgvkvhk4epi

Satisficing in Multi-Armed Bandit Problems

Paul Reverdy, Vaibhav Srivastava, Naomi Ehrich Leonard
2017 IEEE Transactions on Automatic Control  
Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty.  ...  We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold.  ...  Finally, it is well understood that satisficing is an important feature of human decision making [29] and that the UCL algorithm can model many features of human decision decision making in bandit tasks  ... 
doi:10.1109/tac.2016.2644380 fatcat:d4xp4cajp5d2fpjtngw6tjvn4u

Robot fast adaptation to changes in human engagement during simulated dynamic social interaction with active exploration in parameterized reinforcement learning

Mehdi Khamassi, George Velentzas, Theodore Tsitsimis, Costas Tzafestas
2018 IEEE Transactions on Cognitive and Developmental Systems  
action in human-robot interaction scenarios, mainly at the lower level of a multiarmed bandit framework.  ...  on a table), hence in essence similar to the nonstationary multiarmed bandit paradigm.  ... 
doi:10.1109/tcds.2018.2843122 fatcat:5c64vbuft5dklhotrt2kkg5w7q

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Ilya O. Ryzhov, Warren B. Powell, Peter I. Frazier
2012 Operations Research  
Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods.  ...  Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated  ...  This research was supported in part by AFOSR contract FA9550-08-1-0195 and ONR contract N00014-07-1-0150 through the Center for Dynamic Data Analysis.  ... 
doi:10.1287/opre.1110.0999 fatcat:c54svouocbhchhrsuml4xlrmeq

Monte Carlo Search Algorithm Discovery for Single-Player Games

Francis Maes, David Lupien St-Pierre, Damien Ernst
2013 IEEE Transactions on Computational Intelligence and AI in Games  
We rely on multiarmed bandits to approximately solve this optimization problem.  ...  We also show that the discovered algorithms are generally quite robust with respect to changes in the distribution over the training problems.  ...  He is currently an Associate Professor at the University of Liège, where he is affiliated with the Systems and Modeling Research Unit. He is also the holder of the EDF-Luminus Chair on Smart Grids.  ... 
doi:10.1109/tciaig.2013.2239295 fatcat:hucv2zgzfneyhjfh72ggsl545e

Putting bandits into context: How function learning supports decision making

Eric Schulz, Emmanouil Konstantinidis, Maarten Speekenbrink
2018 Journal of Experimental Psychology. Learning, Memory and Cognition  
We introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments.  ...  Participants are mostly able to learn about the context-reward functions and their behaviour is best described by a Gaussian process learning strategy which generalizes previous experience to similar instances  ...  Incorporating context into models of reinforcement learning and decision making generally provides a fruitful avenue for future research.  ... 
doi:10.1037/xlm0000463 pmid:29130693 fatcat:euxutvfh7rbfpkfull3hdns7vq

Putting bandits into context: How function learning supports decision making [article]

Eric Schulz, Emmanouil Konstantinidis, Maarten Speekenbrink
2016 bioRxiv   pre-print
We introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments.  ...  We model participants' behaviour by context-blind (mean-tracking, Kalman filter) and contextual (Gaussian process regression parametrized with different kernels) learning approaches combined with different  ...  Incorporating context into models of reinforcement learning and decision making generally provides a fruitful avenue for future research.  ... 
doi:10.1101/081091 fatcat:3jpqpzybhrhlthoeqvsdmseine

Aversion to Option Loss in a Restless Bandit Task

Danielle J. Navarro, Peter Tran, Nicole Baz
2018 Computational Brain & Behavior  
A Kalman filter model using Thompson sampling provides an excellent account of human learning in a standard restless bandit task, but there are systematic departures in the vanishing bandit task.  ...  Inspired by work in the judgment and decision-making literature, we present two experiments using multi-armed bandit tasks in both static and dynamic environments, in situations where options can become  ...  For simpler versions of the multiarmed bandit problem, there are closed-form solutions for optimal decisions (Whittle 1980) , but in general this is not the case (see Burtini et al. 2015) .  ... 
doi:10.1007/s42113-018-0010-8 fatcat:qzvqyeid7zgh7pqzj5fsinwze4

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis [article]

Vaibhav Srivastava, Paul Reverdy, Naomi Ehrich Leonard
2015 arXiv   pre-print
We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the  ...  We rigorously characterize the influence of accuracy, confidence, and correlation scale in the prior on the decision-making performance of the algorithms.  ...  It is also shown that a variation of the UCL algorithm models human decision-making in an MAB task.  ... 
arXiv:1507.01160v2 fatcat:xaepyacedngjtbhf45s7zbjutm

On Distributed Cooperative Decision-Making in Multiarmed Bandits [article]

Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard
2019 arXiv   pre-print
We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem.  ...  We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group.  ...  Running consensus and related models have been used to study learning [12] and decision-making [23] in social networks.  ... 
arXiv:1512.06888v3 fatcat:baf6suveurf4zanewhjkxh7fuy
« Previous Showing results 1 — 15 out of 242 results