A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
MDPs with Non-Deterministic Policies
2009
Advances in Neural Information Processing Systems
In this paper we introduce the new concept of non-deterministic MDP policies, and address the question of finding near-optimal non-deterministic policies. ...
Markov Decision Processes (MDPs) have been extensively studied and used in the context of planning and decision-making, and many methods exist to find the optimal policy for problems modelled as MDPs. ...
A non-augmentable ✏-optimal non-deterministic policy ⇧ on an MDP M is a policy that is not augmentable according to the constraint in Eqn 6. ...
pmid:21625292
pmcid:PMC3103230
fatcat:loud5sq3zrh4hkqgucsrjvvkze
Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors
2005
International Joint Conference on Artificial Intelligence
We consider the problem of policy optimization for a resource-limited agent with multiple timedependent objectives, represented as an MDP with multiple discount factors in the objective function and constraints ...
We show that limiting search to stationary deterministic policies, coupled with a novel problem reduction to mixed integer programming, yields an algorithm for finding such policies that is computationally ...
deterministic policies provides the first practical approach to dealing with constrained MDPs with multiple discount factors. ...
dblp:conf/ijcai/DolgovD05
fatcat:gxtw5mmmg5e4bixvazvn2hffqm
Non-Deterministic Policies in Markovian Decision Processes
2011
The Journal of Artificial Intelligence Research
In an experiment with human subjects, we show that humans assisted by hints based on non-deterministic policies outperform both human-only and computer-only agents in a web navigation task. ...
We provide two algorithms to compute non-deterministic policies in discrete domains. We study the output and running time of these method on a set of synthetic and real-world problems. ...
A non-augmentable -optimal non-deterministic policy Π on an MDP M is a policy that is non-augmentable according to the constraint in Eqn 23. ...
doi:10.1613/jair.3175
fatcat:arqm4qzs5vcnvbpjgsf53bzrnu
Markov Decision Petri Nets with Uncertainty
[chapter]
2015
Lecture Notes in Computer Science
Markov Decision Processes (MDPs) are a well known mathematical formalism that combines probabilities with decisions and allows one to compute optimal sequences of decisions, denoted as policies, for fairly ...
However, the practical application of MDPs is often faced with two problems: the specification of large models in an efficient and understandable way, which has to be combined with algorithms to generate ...
Every path starting with a non deterministic state followed by a "composite action" σ and by a (maximal) sequence of probabilistic states ending with a non deterministic state, is substituted in the BMDP ...
doi:10.1007/978-3-319-23267-6_12
fatcat:vwjyvq2fkzgmzcinv3rljnmkim
Saturated Path-Constrained MDP: Planning under Uncertainty and Deterministic Model-Checking Constraints
2014
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
This paper presents Saturated Path-Constrained Markov Decision Processes (SPC MDPs), a new MDP type for planning under uncertainty with deterministic model-checking constraints, e.g., "state s must be ...
We present a mathematical analysis of SPCMDPs, showing that although SPC MDPs generally have no optimal policies, every instance of this class has an epsilon-optimal randomized policy for any > 0. ...
Although PC MDPs are more general than SPC MDPs (they allow probabilistic -or non-saturated -constraints of the kind "the solution policy must visit state s before s at least with probability p"), their ...
doi:10.1609/aaai.v28i1.9041
fatcat:74vbkkvg55c3tel5s7ku4wbvjm
The 10,000 Facets of MDP Model Checking
[chapter]
2019
Lecture Notes in Computer Science
We focus on Markov decision processes (MDPs, for short). ...
We survey the basic ingredients of MDP model checking and discuss its enormous developments since the seminal works by Courcoubetis and Yannakakis in the early 1990s. ...
Randomized positional policies select μ ∈ D(s i ) with a certain probability. Deterministic policies select a fixed distribution from D(s i ). ...
doi:10.1007/978-3-319-91908-9_21
fatcat:yjsuwb5ibjff3cq3niatu6sbxq
LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning
[article]
2022
arXiv
pre-print
with maximal probability. ...
LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification ...
Namely, when there exists a non-deterministic transition in an LDBA state, the MDP action space is augmented with the non-deterministic transition predicate of the LDBA. ...
arXiv:2209.10341v1
fatcat:e4vhttrwsvcqdpbfexdlo2w2im
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees
[article]
2023
arXiv
pre-print
policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. ...
Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications ...
The common choices of automata include deterministic Rabin automata and non-deterministic Büchi automata. ...
arXiv:2305.01381v2
fatcat:navp6kp3pfdkpc7mssqlrybn2a
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
[chapter]
1994
Machine Learning Proceedings 1994
In this paper we consider only partially observable MDPs (POMDPs), a useful class of non-Markovian decision processes. ...
., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). ...
Therefore, in MDPs the agent can restrict its search to the nite set of stationary deterministic policies. ...
doi:10.1016/b978-1-55860-335-6.50042-8
dblp:conf/icml/SinghJJ94
fatcat:22vp5lktdzbbxbe7vslfeb35qi
Extending Classical Planning Heuristics to Probabilistic Planning with Dead-Ends
2011
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
the winners of previous competitions (FF-Replan, FPG, RFF), and show that our discounted heuristics solve more problems than non-discounted ones, with better criteria values. ...
Previous attempts like mGPT used classical planning heuristics to an all-outcome determinization of MDPs without discount factor; yet, discounted optimization is required to solve problems with potential ...
A problem is considered solved by a planner if its policy reaches a goal state with a non zero probability. Candidate MDP heuristic algorithms. ...
doi:10.1609/aaai.v25i1.8016
fatcat:arvs5sclfzfkvb6lcmuzo3nv6m
Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees
2019
2019 IEEE 58th Conference on Decision and Control (CDC)
We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. ...
Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph ...
The following theorem shows that the optimal policy produced by Algorithm 1 satisfies the given LTL property with non-zero probability. ...
doi:10.1109/cdc40024.2019.9028919
dblp:conf/cdc/HasanbeigKAKPL19
fatcat:w4yfiw5vkrdnnhmiqp6aopc2ya
Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming
[article]
2020
arXiv
pre-print
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number ...
reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing ...
As discussed above, γ = 1 for this MDP with deterministic transitions. ...
arXiv:2001.10174v1
fatcat:gss3vsncgvd3tcx2nu4xj2hjqa
Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles
[article]
2019
arXiv
pre-print
Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles. ...
ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies. ...
(c) T Test comparing the oASP(MDP) with Heuristic and the traditional oASP(MDP).
Fig. 6 : 6 Number of Steps and Return results for the Non-Deterministic Fisherman's Folly puzzle. ...
arXiv:1903.03411v1
fatcat:h7uudj43zzavzocg2kshefcuma
Synthesis of Discounted-Reward Optimal Policies for Markov Decision Processes Under Linear Temporal Logic Specifications
[article]
2021
arXiv
pre-print
We present a method to find an optimal policy with respect to a reward function for a discounted Markov decision process under general linear temporal logic (LTL) specifications. ...
These occupancy measures are then connected to a single policy via a novel reduction resulting in a mixed integer linear program whose solution provides an optimal policy. ...
s ∈ S × visited with non-zero probability, policy π × is deterministic and, further, y sa > 0 if and only if ∆ sa = 1. ...
arXiv:2011.00632v2
fatcat:d6nzjaccrvgu3fkmgttaxlefo4
A Theoretical Connection Between Statistical Physics and Reinforcement Learning
[article]
2021
arXiv
pre-print
Moreover, when the MDP dynamics are deterministic, the Bellman equation for 𝒵 is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions ...
The policies learned via these 𝒵-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. ...
By working with partition functions we transformed a non linear problem into a linear one. This remarkable result is reminiscent of linearly solvable MDPs (Todorov, 2007) . ...
arXiv:1906.10228v2
fatcat:fkt4i2kjhrarxipjm4up3vriea
« Previous
Showing results 1 — 15 out of 14,006 results