A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
1994
Neural Computation
FUNDING NUMBERS
On the Convergence of Stochastic Iterative Dynamic Programming
NSF-ASC-9217041
Algsorithms
N00014-90-J-1942
NSFPECS:92-6531
IC)
6. ...
Indeed, real-time dynamic programming is arguably a form of learning algorithm as it stands. ...
doi:10.1162/neco.1994.6.6.1185
fatcat:db7tg7xngzhhtlgvawtk6oo2qu
Page 1171 of Mathematical Reviews Vol. , Issue 88b
[page]
1988
Mathematical Reviews
Convergence and optimality of quasi- Newtonian algorithms of stochastic optimization. (Russian)
Dinamika Sistem 1985, Adapt. i Optim., 3-21, 151. ...
Two broad categories are considered: (i) finite methods based on a pivoting procedure and (ii) infinite iterative convergent algorithms. ...
Stochastic iterative dynamic programming: a Monte Carlo approach to dual control
2005
Automatica
Also, being a generalization of iterative dynamic programming (IDP) to the stochastic domain, the new algorithm exhibits reduced sensitivity to the hyper-state dimension and, consequently, is particularly ...
This paper presents a new stochastic dynamic programming algorithm that uses a Monte Carlo approach to circumvent the need for numerical integration, thereby dramatically reducing computational requirements ...
Algorithms such as value iteration, 2 policy iteration, Q-learning and neuro-dynamic programming are well-known dynamic programming approaches that employ Monte Carlo sampling in stochastic settings ( ...
doi:10.1016/j.automatica.2004.12.003
fatcat:licxql2z75cbreyqbmcjnrmyku
Modified Dynamic Programming Algorithms for GLOSA Systems with Stochastic Signal Switching Times
[article]
2022
arXiv
pre-print
The present work considers a different modified version of Dynamic Programming, known as Differential Dynamic Programming (DDP). ...
To overcome the computation time bottleneck, as a first attempt, a modified version of Dynamic Programming, known as Discrete Differential Dynamic Programming (DDDP) was recently employed for the numerical ...
Differential Dynamic Programming (DDDP) algorithm (Heidari et al., 1971) . ...
arXiv:2211.12159v1
fatcat:ib63nfrby5bu5bcamq6o3vskzi
Empirical Dynamic Programming
[article]
2013
arXiv
pre-print
Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms. ...
Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. ...
Empirical Algorithms for Dynamic Programming We now present empirical variants of dynamic programming algorithms. Our focus will be on value and policy iteration. ...
arXiv:1311.5918v1
fatcat:c6f723vqdvgpheorj35cc54muy
How does a stochastic optimization/approximation algorithm adapt to a randomly evolving optimum/root with jump Markov sample paths
2007
Mathematical programming
Using stochastic averaging, we prove convergence of the algorithm. Rate of convergence of the algorithm is obtained via bounds on the estimation errors and diffusion approximations. ...
Remarks on improving the convergence rates through iterate averaging, and limit mean dynamics represented by differential inclusions are also presented. ...
Due to the small step size used in the recursive computation of the sequence of iterates (parameter estimates), the stochastic optimization and approximation algorithms can be considered as a slow dynamical ...
doi:10.1007/s10107-007-0145-1
fatcat:6eqrggbeqrannlaifxxnox6aga
Stable LInear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions
1995
Neural Information Processing Systems
We consider the solution to large stochastic control problems by means of methods that rely on compact representations and a variant of the value iteration algorithm to compute approximate costto-go functions ...
This class involves linear parameterizations of the cost-to-go function together with an assumption that the dynamic programming operator is a contraction with respect to the Euclidean norm when applied ...
APPROXIMATIONS TO DYNAMIC PROGRAMMING Classical dynamic programming algorithms such as value iteration require that we maintain and update a vector V of dimension n. ...
dblp:conf/nips/RoyT95
fatcat:u2g7ry6nqzdirmivrogg2cysry
Empirical Dynamic Programming
2016
Mathematics of Operations Research
Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms. ...
Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. ...
Empirical Algorithms for Dynamic Programming We now present empirical variants of dynamic programming algorithms. Our focus will be on value and policy iteration. ...
doi:10.1287/moor.2015.0733
fatcat:jjt6s3jbyvcxjlmhpzadbcsw5m
Page 2780 of Mathematical Reviews Vol. , Issue 87e
[page]
1987
Mathematical Reviews
87e:90121
Summary: “We report on the computational aspects of high level algorithms developed for efficiently processing the diverging and converging branch systems in nonserial dynamic programming. ...
V. 87e:90121 Necessary and sufficient conditions of optimality for problems of linear dynamic programming. ...
A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications
2011
Journal of Control Theory and Applications
We review the literature on approximate dynamic programming, with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued ...
We then describe some recent research by the authors on approximate policy iteration algorithms that offer convergence guarantees (with technical assumptions) for both parametric and nonparametric architectures ...
[49] considers a value-iteration-based approximate dynamic programming algorithm without knowledge of the internal dynamics of the system. ...
doi:10.1007/s11768-011-0313-y
fatcat:ea6l7fzscjdbflgrft3b33b7ve
Efficient Parallelization of the Stochastic Dual Dynamic Programming Algorithm Applied to Hydropower Scheduling
2015
Energies
Stochastic dual dynamic programming (SDDP) has become a popular algorithm used in practical long-term scheduling of hydropower systems. ...
This paper presents a novel parallel scheme for the SDDP algorithm, where the stage-wise synchronization point traditionally used in the backward iteration of the SDDP algorithm is partially relaxed. ...
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/en81212431
fatcat:rbk6szhr5fdqnd7ke6w74nqdiq
Suboptimality Bounds for Stochastic Shortest Path Problems
[article]
2012
arXiv
pre-print
We consider how to use the Bellman residual of the dynamic programming operator to compute suboptimality bounds for solutions to stochastic shortest path problems. ...
Such bounds have been previously established only in the special case that "all policies are proper," in which case the dynamic programming operator is known to be a contraction, and have been shown to ...
algorithms: value iteration and policy iteration. ...
arXiv:1202.3729v1
fatcat:vf4q5afr6be7vf3flc2pgyd2jm
Page 1185 of Neural Computation Vol. 6, Issue 6
[page]
1994
Neural Computation
Communicated by Steven Whitehead =———=
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
Tommi Jaakkola
Michael I. ...
These algorithms, including the TD()) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). ...
Gradient-Bounded Dynamic Programming with Submodular and Concave Extensible Value Functions
[article]
2020
arXiv
pre-print
For the case that the value function of the dynamic program is concave extensible and submodular in its state-space, we present a new algorithm that computes deterministic upper and stochastic lower bounds ...
We then show that the proposed algorithm terminates after a finite number of iterations. ...
ACKNOWLEDGEMENTS We gratefully acknowledge the helpful discussions with Michael Garstka, Department of Engineering Science, University of Oxford, on the Julia implementation of our Algorithm. ...
arXiv:2005.11213v1
fatcat:mrbqw3yz5ngkdlb4jqufmgf33y
New prioritized value iteration for Markov decision processes
2011
Artificial Intelligence Review
Here, we propose an improved value iteration algorithm based on Dijkstra's algorithm for solving shortest path Markov decision processes. ...
For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. ...
Value iteration is a dynamic programming algorithm (Bellman 1957) for solving MDPs, but it is usually not considered because of its slow convergence (Littman 1995) . ...
doi:10.1007/s10462-011-9224-z
fatcat:jteuazrrpnep7lvn4eagqbse7m
« Previous
Showing results 1 — 15 out of 72,677 results