Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








14,006 Hits in 3.0 sec

MDPs with Non-Deterministic Policies

Mahdi Milani Fard, Joelle Pineau
2009 Advances in Neural Information Processing Systems  
In this paper we introduce the new concept of non-deterministic MDP policies, and address the question of finding near-optimal non-deterministic policies.  ...  Markov Decision Processes (MDPs) have been extensively studied and used in the context of planning and decision-making, and many methods exist to find the optimal policy for problems modelled as MDPs.  ...  A non-augmentable ✏-optimal non-deterministic policy ⇧ on an MDP M is a policy that is not augmentable according to the constraint in Eqn 6.  ... 
pmid:21625292 pmcid:PMC3103230 fatcat:loud5sq3zrh4hkqgucsrjvvkze

Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors

Dmitri A. Dolgov, Edmund H. Durfee
2005 International Joint Conference on Artificial Intelligence  
We consider the problem of policy optimization for a resource-limited agent with multiple timedependent objectives, represented as an MDP with multiple discount factors in the objective function and constraints  ...  We show that limiting search to stationary deterministic policies, coupled with a novel problem reduction to mixed integer programming, yields an algorithm for finding such policies that is computationally  ...  deterministic policies provides the first practical approach to dealing with constrained MDPs with multiple discount factors.  ... 
dblp:conf/ijcai/DolgovD05 fatcat:gxtw5mmmg5e4bixvazvn2hffqm

Non-Deterministic Policies in Markovian Decision Processes

M. Milani Fard, J. Pineau
2011 The Journal of Artificial Intelligence Research  
In an experiment with human subjects, we show that humans assisted by hints based on non-deterministic policies outperform both human-only and computer-only agents in a web navigation task.  ...  We provide two algorithms to compute non-deterministic policies in discrete domains. We study the output and running time of these method on a set of synthetic and real-world problems.  ...  A non-augmentable -optimal non-deterministic policy Π on an MDP M is a policy that is non-augmentable according to the constraint in Eqn 23.  ... 
doi:10.1613/jair.3175 fatcat:arqm4qzs5vcnvbpjgsf53bzrnu

Markov Decision Petri Nets with Uncertainty [chapter]

Marco Beccuti, Elvio G. Amparore, Susanna Donatelli, Dimitri Scheftelowitsch, Peter Buchholz, Giuliana Franceschinis
2015 Lecture Notes in Computer Science  
Markov Decision Processes (MDPs) are a well known mathematical formalism that combines probabilities with decisions and allows one to compute optimal sequences of decisions, denoted as policies, for fairly  ...  However, the practical application of MDPs is often faced with two problems: the specification of large models in an efficient and understandable way, which has to be combined with algorithms to generate  ...  Every path starting with a non deterministic state followed by a "composite action" σ and by a (maximal) sequence of probabilistic states ending with a non deterministic state, is substituted in the BMDP  ... 
doi:10.1007/978-3-319-23267-6_12 fatcat:vwjyvq2fkzgmzcinv3rljnmkim

Saturated Path-Constrained MDP: Planning under Uncertainty and Deterministic Model-Checking Constraints

Jonathan Sprauel, Andrey Kolobov, Florent Teichteil-Königsbuch
2014 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
This paper presents Saturated Path-Constrained Markov Decision Processes (SPC MDPs), a new MDP type for planning under uncertainty with deterministic model-checking constraints, e.g., "state s must be  ...  We present a mathematical analysis of SPCMDPs, showing that although SPC MDPs generally have no optimal policies, every instance of this class has an epsilon-optimal randomized policy for any > 0.  ...  Although PC MDPs are more general than SPC MDPs (they allow probabilistic -or non-saturated -constraints of the kind "the solution policy must visit state s before s at least with probability p"), their  ... 
doi:10.1609/aaai.v28i1.9041 fatcat:74vbkkvg55c3tel5s7ku4wbvjm

The 10,000 Facets of MDP Model Checking [chapter]

Christel Baier, Holger Hermanns, Joost-Pieter Katoen
2019 Lecture Notes in Computer Science  
We focus on Markov decision processes (MDPs, for short).  ...  We survey the basic ingredients of MDP model checking and discuss its enormous developments since the seminal works by Courcoubetis and Yannakakis in the early 1990s.  ...  Randomized positional policies select μ ∈ D(s i ) with a certain probability. Deterministic policies select a fixed distribution from D(s i ).  ... 
doi:10.1007/978-3-319-91908-9_21 fatcat:yjsuwb5ibjff3cq3niatu6sbxq

LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning [article]

Hosein Hasanbeig and Daniel Kroening and Alessandro Abate
2022 arXiv   pre-print
with maximal probability.  ...  LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification  ...  Namely, when there exists a non-deterministic transition in an LDBA state, the MDP action space is augmented with the non-deterministic transition predicate of the LDBA.  ... 
arXiv:2209.10341v1 fatcat:e4vhttrwsvcqdpbfexdlo2w2im

Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees [article]

Daqian Shao, Marta Kwiatkowska
2023 arXiv   pre-print
policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees.  ...  Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications  ...  The common choices of automata include deterministic Rabin automata and non-deterministic Büchi automata.  ... 
arXiv:2305.01381v2 fatcat:navp6kp3pfdkpc7mssqlrybn2a

Learning Without State-Estimation in Partially Observable Markovian Decision Processes [chapter]

Satinder P. Singh, Tommi Jaakkola, Michael I. Jordan
1994 Machine Learning Proceedings 1994  
In this paper we consider only partially observable MDPs (POMDPs), a useful class of non-Markovian decision processes.  ...  ., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs).  ...  Therefore, in MDPs the agent can restrict its search to the nite set of stationary deterministic policies.  ... 
doi:10.1016/b978-1-55860-335-6.50042-8 dblp:conf/icml/SinghJJ94 fatcat:22vp5lktdzbbxbe7vslfeb35qi

Extending Classical Planning Heuristics to Probabilistic Planning with Dead-Ends

Florent Teichteil-Königsbuch, Vincent Vidal, Guillaume Infantes
2011 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
the winners of previous competitions (FF-Replan, FPG, RFF), and show that our discounted heuristics solve more problems than non-discounted ones, with better criteria values.  ...  Previous attempts like mGPT used classical planning heuristics to an all-outcome determinization of MDPs without discount factor; yet, discounted optimization is required to solve problems with potential  ...  A problem is considered solved by a planner if its policy reaches a goal state with a non zero probability. Candidate MDP heuristic algorithms.  ... 
doi:10.1609/aaai.v25i1.8016 fatcat:arvs5sclfzfkvb6lcmuzo3nv6m

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

M. Hasanbeig, Y. Kantaros, A. Abate, D. Kroening, G. J. Pappas, I. Lee
2019 2019 IEEE 58th Conference on Decision and Control (CDC)  
We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP.  ...  Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph  ...  The following theorem shows that the optimal policy produced by Algorithm 1 satisfies the given LTL property with non-zero probability.  ... 
doi:10.1109/cdc40024.2019.9028919 dblp:conf/cdc/HasanbeigKAKPL19 fatcat:w4yfiw5vkrdnnhmiqp6aopc2ya

Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming [article]

Eugene A. Feinberg, Gaojin He
2020 arXiv   pre-print
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number  ...  reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing  ...  As discussed above, γ = 1 for this MDP with deterministic transitions.  ... 
arXiv:2001.10174v1 fatcat:gss3vsncgvd3tcx2nu4xj2hjqa

Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles [article]

Thiago Freitas dos Santos, Paulo E. Santos, Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Pedro Cabalar
2019 arXiv   pre-print
Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles.  ...  ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies.  ...  (c) T Test comparing the oASP(MDP) with Heuristic and the traditional oASP(MDP). Fig. 6 : 6 Number of Steps and Return results for the Non-Deterministic Fisherman's Folly puzzle.  ... 
arXiv:1903.03411v1 fatcat:h7uudj43zzavzocg2kshefcuma

Synthesis of Discounted-Reward Optimal Policies for Markov Decision Processes Under Linear Temporal Logic Specifications [article]

Krishna C. Kalagarla, Rahul Jain, Pierluigi Nuzzo
2021 arXiv   pre-print
We present a method to find an optimal policy with respect to a reward function for a discounted Markov decision process under general linear temporal logic (LTL) specifications.  ...  These occupancy measures are then connected to a single policy via a novel reduction resulting in a mixed integer linear program whose solution provides an optimal policy.  ...  s ∈ S × visited with non-zero probability, policy π × is deterministic and, further, y sa > 0 if and only if ∆ sa = 1.  ... 
arXiv:2011.00632v2 fatcat:d6nzjaccrvgu3fkmgttaxlefo4

A Theoretical Connection Between Statistical Physics and Reinforcement Learning [article]

Jad Rahme, Ryan P. Adams
2021 arXiv   pre-print
Moreover, when the MDP dynamics are deterministic, the Bellman equation for 𝒵 is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions  ...  The policies learned via these 𝒵-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations.  ...  By working with partition functions we transformed a non linear problem into a linear one. This remarkable result is reminiscent of linearly solvable MDPs (Todorov, 2007) .  ... 
arXiv:1906.10228v2 fatcat:fkt4i2kjhrarxipjm4up3vriea
« Previous Showing results 1 — 15 out of 14,006 results