Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits.

| In this paper, we present a formal model of human decision making in explore-exploit tasks using the context of multiarmed bandit problems, where the decision maker must choose among multiple options ... We focus on the case of Gaussian rewards in a setting where the decision maker uses Bayesian inference to estimate the reward values. ... Cohen for their input, which helped make possible the strong connection of this work to the psychology literature. ...

doi:10.1109/jproc.2014.2307024 fatcat:6xwlrab5ynbu5ag7qnjj544ihq

We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain ... We focus on the case of Gaussian rewards in a setting where the decision-maker uses Bayesian inference to estimate the reward values. ... Cohen for their input, which helped make possible the strong connection of this work to the psychology literature. ...

arXiv:1307.6134v5 fatcat:gpucau2sxzb4dj2p3bh3jksloi

Multiple Versions

This article mainly focuses on expanding on the idea of the motion in mind model in the scene of Multiarmed Bandit games, quantifying the player's psychological inclination by simulation experimental data ... Also, the Multiarmed Bandit game is a typical test for Skinner Box design and is most popular in the gambling house, which is a good example to analyze. ... Bandit in this simulation follows Gaussian distribution, where every arm follows the Gaussian distribution. ...

doi:10.3390/info12120521 fatcat:hth47aalinhdjoqs5rpf3hf5yy

DOAJ Szczepanski

We consider a heuristic Bayesian algorithm as a model of human decision making in multi-armed bandit problems with Gaussian rewards. ... The stochastic algorithm encodes many of the observed features of human decision making. ... Application to human decision making Human decision making in multi-armed bandit problems is well modeled by a heuristic similar to that of UCL (11) and humans are sensitive to the parameters of the ...

doi:10.1109/ecc.2014.6862580 dblp:conf/eucc/ReverdySL14 fatcat:meuumb5h7zeenjo5gmegz5iwzi

We observe that the multi-armed bandit problem with transition costs and the associated block allocation algorithm capture the key features of popular animal foraging models in literature. ... We consider two variants of the standard multiarmed bandit problem, namely, the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs. ... [6] established the optimality of a Bayesian UCB algorithm for Gaussian rewards and drew several connections between these algorithms and human decision-making. ...

doi:10.1109/allerton.2013.6736565 dblp:conf/allerton/SrivastavaRL13 fatcat:k7wo7zzmjrfpfdclg53zzpqadi

Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. ... We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. ... Finally, it is well understood that satisficing is an important feature of human decision making [29] and that the UCL algorithm can model many features of human decision decision making in bandit tasks ...

arXiv:1512.07638v2 fatcat:ourj3y6dpjgm7lepgvkvhk4epi

Multiple Versions

Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. ... We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. ... Finally, it is well understood that satisficing is an important feature of human decision making [29] and that the UCL algorithm can model many features of human decision decision making in bandit tasks ...

doi:10.1109/tac.2016.2644380 fatcat:d4xp4cajp5d2fpjtngw6tjvn4u

action in human-robot interaction scenarios, mainly at the lower level of a multiarmed bandit framework. ... on a table), hence in essence similar to the nonstationary multiarmed bandit paradigm. ...

doi:10.1109/tcds.2018.2843122 fatcat:5c64vbuft5dklhotrt2kkg5w7q

Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. ... Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated ... This research was supported in part by AFOSR contract FA9550-08-1-0195 and ONR contract N00014-07-1-0150 through the Center for Dynamic Data Analysis. ...

doi:10.1287/opre.1110.0999 fatcat:c54svouocbhchhrsuml4xlrmeq

We rely on multiarmed bandits to approximately solve this optimization problem. ... We also show that the discovered algorithms are generally quite robust with respect to changes in the distribution over the training problems. ... He is currently an Associate Professor at the University of Liège, where he is affiliated with the Systems and Modeling Research Unit. He is also the holder of the EDF-Luminus Chair on Smart Grids. ...

doi:10.1109/tciaig.2013.2239295 fatcat:hucv2zgzfneyhjfh72ggsl545e

We introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments. ... Participants are mostly able to learn about the context-reward functions and their behaviour is best described by a Gaussian process learning strategy which generalizes previous experience to similar instances ... Incorporating context into models of reinforcement learning and decision making generally provides a fruitful avenue for future research. ...

doi:10.1037/xlm0000463 pmid:29130693 fatcat:euxutvfh7rbfpkfull3hdns7vq

We introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments. ... We model participants' behaviour by context-blind (mean-tracking, Kalman filter) and contextual (Gaussian process regression parametrized with different kernels) learning approaches combined with different ... Incorporating context into models of reinforcement learning and decision making generally provides a fruitful avenue for future research. ...

doi:10.1101/081091 fatcat:3jpqpzybhrhlthoeqvsdmseine

A Kalman filter model using Thompson sampling provides an excellent account of human learning in a standard restless bandit task, but there are systematic departures in the vanishing bandit task. ... Inspired by work in the judgment and decision-making literature, we present two experiments using multi-armed bandit tasks in both static and dynamic environments, in situations where options can become ... For simpler versions of the multiarmed bandit problem, there are closed-form solutions for optimal decisions (Whittle 1980) , but in general this is not the case (see Burtini et al. 2015) . ...

doi:10.1007/s42113-018-0010-8 fatcat:qzvqyeid7zgh7pqzj5fsinwze4

We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the ... We rigorously characterize the influence of accuracy, confidence, and correlation scale in the prior on the decision-making performance of the algorithms. ... It is also shown that a variation of the UCL algorithm models human decision-making in an MAB task. ...

arXiv:1507.01160v2 fatcat:xaepyacedngjtbhf45s7zbjutm

Multiple Versions

We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. ... We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group. ... Running consensus and related models have been used to study learning [12] and decision-making [23] in social networks. ...

arXiv:1512.06888v3 fatcat:baf6suveurf4zanewhjkxh7fuy

Multiple Versions

Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits

Preserved Fulltext

Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits [article]

Preserved Fulltext

Other Versions

Addictive Games: Case Study on Multi-Armed Bandit Game

Preserved Fulltext

Algorithmic models of human decision making in Gaussian multi-armed bandit problems

Preserved Fulltext

On optimal foraging and multi-armed bandits

Preserved Fulltext

Satisficing in multi-armed bandit problems [article]

Preserved Fulltext

Other Versions

Satisficing in Multi-Armed Bandit Problems

Preserved Fulltext

Robot fast adaptation to changes in human engagement during simulated dynamic social interaction with active exploration in parameterized reinforcement learning

Preserved Fulltext

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Preserved Fulltext

Monte Carlo Search Algorithm Discovery for Single-Player Games

Preserved Fulltext

Putting bandits into context: How function learning supports decision making

Preserved Fulltext

Putting bandits into context: How function learning supports decision making [article]

Preserved Fulltext

Aversion to Option Loss in a Restless Bandit Task

Preserved Fulltext

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis [article]

Preserved Fulltext

Other Versions

On Distributed Cooperative Decision-Making in Multiarmed Bandits [article]

Preserved Fulltext

Other Versions