A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
1994
Neural Information Processing Systems
We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. ...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. ...
Acknowledgments The authors thank Rich Sutton for pointing out errors at early stages of this work. ...
dblp:conf/nips/JaakkolaSJ94
fatcat:nvsm7pcrdfcuxevwx76uka2fjq
Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing [Bookshelf]
2019
IEEE Control Systems
Partially observed Markov decision processes (POMDPs) are a significant paradigm in real-world sequential decision making. ...
Partially Observed Markov Decision Processes: From Filtering To Controlled Sensing is a valuable contribution to the literature on stochastic decision-making/control and stochastic optimization, and I ...
doi:10.1109/mcs.2019.2913493
fatcat:2eozixvhbjhzbby27rxrgqh4oa
Targets in Reinforcement Learning to solve Stackelberg Security Games
[article]
2022
arXiv
pre-print
Reinforcement Learning (RL) algorithms have been successfully applied to real world situations like illegal smuggling, poaching, deforestation, climate change, airport security, etc. ...
This review investigates modeling of SSGs in RL with a focus on possible improvements of target representations in RL algorithms. ...
These algorithms will need to be extended to multiagent RL for understanding optimal attacker policies as well. arXiv:2211.17132v1 [cs.LG] 30 Nov 2022 Partially Observable Markov Decision Process Partially ...
arXiv:2211.17132v1
fatcat:27spdjye5zainlmvb3pwy6riqy
The National Science Foundation Workshop on Reinforcement Learning
1996
The AI Magazine
, partially observable decision problems (Russell and Parr 1995) . ...
Markov Decision Processes and Dynamic Programming A key assumption underlying much research in reinforcement learning is that the agent-environment interaction can be viewed as a Markov decision process ...
doi:10.1609/aimag.v17i4.1244
dblp:journals/aim/MahadevanK96
fatcat:vrz3h6o2cnb6njmtnohflsfksa
Efficient Identification of State in Reinforcement Learning
2009
Künstliche Intelligenz
A very general framework for modeling uncertainty in learning environments is given by partially observable Markov Decision Processes (POMDPs). ...
In this article, we will present a reinforcement learning algorithm for solving deterministic POMDPs based on short-term memory. ...
Deterministic Partially Observable Markov Decision Process (POMDP) A deterministic partially observable Markov Decision Process M := (T, S, A, O, f S , f O , In general, the transition function f S and ...
dblp:journals/ki/TimmerR09
fatcat:mtledogt5jcqjnlpczhya4x6di
Inducing Partially Observable Markov Decision Processes
2012
Journal of machine learning research
Two different kinds of environment models dominate the literature-Markov Decision Processes (Puterman, 1994; Littman et al., 1995) , or MDPs, and POMDPs, their Partially Observable counterpart (White ...
The learning problem is not as well studied, but algorithms for learning to approximately optimize an MDP with a polynomial amount of experience have been created (Kearns and Singh, 2002; Strehl et al ...
dblp:journals/jmlr/Littman12
fatcat:6atocal7xfbunh6sd5xyahicpi
Hierarchically Structured Scheduling and Execution of Tasks in a Multi-Agent Environment
[article]
2022
arXiv
pre-print
Reinforcement learning, however, is suited to deal with issues requiring making sequential decisions towards a long-term, often remote, goal. ...
We propose to use deep reinforcement learning to solve both the high-level scheduling problem and the low-level multi-agent problem of schedule execution. ...
Markov Decision Problems Reinforcement Learning algorithms propose to solve sequential decision-making problems formally described as Markov Decision Problems (Puterman, 2014) . ...
arXiv:2203.03021v1
fatcat:undob22rivgbncl2bxddshbhja
Page 5650 of Mathematical Reviews Vol. , Issue 2003g
[page]
2003
Mathematical Reviews
In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in partially observable Markov decision processes (POMDPs) controlled ...
Summary: “In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). ...
Shaping multi-agent systems with gradient reinforcement learning
2007
Autonomous Agents and Multi-Agent Systems
An original Reinforcement Learning (RL) methodology is proposed for the design of multi-agent systems. ...
But to cope with the difficulties inherent to RL used in that framework, we have developed an incremental learning algorithm where agents face a sequence of progressively more complex tasks. ...
Reinforcement Learning
Markov Decision Processes We first consider the ideal theoretic framework for Reinforcement Learning, that is to say Markov Decision Processes ( , ). ...
doi:10.1007/s10458-006-9010-5
fatcat:7pdttndjyzhtrelvedbrh5y4o4
An Extension of Profit Sharing to Partially Observable Markov Decision Processes: Proposition of PS-r* and its Evaluation
??????????????????????????????????�??????
2003
Transactions of the Japanese society for artificial intelligence
??????????????????????????????????�??????
Observable Markov Decision Processes (POMDPs). ...
We know the rationality theorem of Profit Sharing(PS) [Miyazaki 94, Miyazaki 99b] and the Rational Policy Making algorithm(RPM) [Miyazaki 99a] to guarantee the rationality in a typical class of Partially ...
Jordan: Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems, Advances in Neural InformationVol. 11, No. 5, pp. 761-768 (1996) [Konda 00] V. R. Konda and J. N. ...
doi:10.1527/tjsai.18.286
fatcat:zjwvgcgjbfbbzmaha26u32ry7q
Model-Free Recurrent Reinforcement Learning for AUV Horizontal Control
2018
IOP Conference Series: Materials Science and Engineering
These control problems are considered as a POMDP (Partially Observable Markov Decision Process). ...
In this paper, aiming at the problems of 2-DOF horizontal motion control with high precision for autonomous underwater vehicle(AUV) trajectory tracking tasks, deep reinforcement learning controllers are ...
Systems with these problems are described as Partially-Observable Markov Decision Processes(POMDP). ...
doi:10.1088/1757-899x/428/1/012063
fatcat:v7vacmagabagncx5v7bre5gzmq
Learning agents for uncertain environments (extended abstract)
1998
Proceedings of the eleventh annual conference on Computational learning theory - COLT' 98
This talk proposes a very simple "baseline architecture" for a learning agent that can handle stochastic, partially observable environments. ...
This seems to be a very interesting problem for the COLT, UAI, and ML communities, and has been addressed in econometrics under the heading of structural estimation of Markov decision processes. ...
Reinforcement learning (RL) methods are essentially online algorithmd for solving Markov decision processes (MDPs). ...
doi:10.1145/279943.279964
dblp:conf/colt/Russell98
fatcat:bnv35r7dzfdzfdhch3qea6amdy
A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
2018
IEEE Access
The problems of RL in such settings can be formulated as a partially observable Markov decision process (POMDP). ...
We propose a hierarchical deep reinforcement learning approach for learning in hierarchical POMDP. The deep hierarchical RL algorithm is proposed to apply to both MDP and POMDP learning. ...
Markov decision process for RL, and deep reinforcement learning. ...
doi:10.1109/access.2018.2854283
fatcat:rffxlckxcjg2fkucnqy53b64x4
Learning Factored Representations for Partially Observable Markov Decision Processes
1999
Neural Information Processing Systems
The problem of reinforcement learning in a non-Markov environment is explored using a dynamic Bayesian network, where conditional independence assumptions between random variables are compactly represented ...
The parameters are learned on-line, and approximations are used to perform inference and to compute the optimal value function. ...
Acknowledgments We thank Geoffrey Hinton, Zoubin Ghahramani and Andy Brown for helpful discussions, the anonymous referees for valuable comments and criticism, and particularly Peter Dayan for helpful ...
dblp:conf/nips/Sallans99
fatcat:r4y7utdaqzfavinbo4xokiji4a
Reinforcement Learning with Hierarchies of Machines
1997
Neural Information Processing Systems
We present provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrate their effectiveness on a problem with several thousand states. ...
We present a new approach to reinforcement learning in which the policies considered by the learning process are constrained by hierarchies of partially specified machines. ...
We expect that successful pursuit of these lines of research will provide a formal basis for understanding and unifying several seemingly disparate approaches to control, including behavior-based methods ...
dblp:conf/nips/ParrR97
fatcat:i5btazdpgjctdi4zg3prylaeou
« Previous
Showing results 1 — 15 out of 21,454 results