Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








72,677 Hits in 3.9 sec

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh
1994 Neural Computation  
FUNDING NUMBERS On the Convergence of Stochastic Iterative Dynamic Programming NSF-ASC-9217041 Algsorithms N00014-90-J-1942 NSFPECS:92-6531 IC) 6.  ...  Indeed, real-time dynamic programming is arguably a form of learning algorithm as it stands.  ... 
doi:10.1162/neco.1994.6.6.1185 fatcat:db7tg7xngzhhtlgvawtk6oo2qu

Page 1171 of Mathematical Reviews Vol. , Issue 88b [page]

1988 Mathematical Reviews  
Convergence and optimality of quasi- Newtonian algorithms of stochastic optimization. (Russian) Dinamika Sistem 1985, Adapt. i Optim., 3-21, 151.  ...  Two broad categories are considered: (i) finite methods based on a pivoting procedure and (ii) infinite iterative convergent algorithms.  ... 

Stochastic iterative dynamic programming: a Monte Carlo approach to dual control

Adrian M. Thompson, William R. Cluett
2005 Automatica  
Also, being a generalization of iterative dynamic programming (IDP) to the stochastic domain, the new algorithm exhibits reduced sensitivity to the hyper-state dimension and, consequently, is particularly  ...  This paper presents a new stochastic dynamic programming algorithm that uses a Monte Carlo approach to circumvent the need for numerical integration, thereby dramatically reducing computational requirements  ...  Algorithms such as value iteration, 2 policy iteration, Q-learning and neuro-dynamic programming are well-known dynamic programming approaches that employ Monte Carlo sampling in stochastic settings (  ... 
doi:10.1016/j.automatica.2004.12.003 fatcat:licxql2z75cbreyqbmcjnrmyku

Modified Dynamic Programming Algorithms for GLOSA Systems with Stochastic Signal Switching Times [article]

Panagiotis Typaldos, Markos Papageorgiou
2022 arXiv   pre-print
The present work considers a different modified version of Dynamic Programming, known as Differential Dynamic Programming (DDP).  ...  To overcome the computation time bottleneck, as a first attempt, a modified version of Dynamic Programming, known as Discrete Differential Dynamic Programming (DDDP) was recently employed for the numerical  ...  Differential Dynamic Programming (DDDP) algorithm (Heidari et al., 1971) .  ... 
arXiv:2211.12159v1 fatcat:ib63nfrby5bu5bcamq6o3vskzi

Empirical Dynamic Programming [article]

William B. Haskell, Rahul Jain, Dileep Kalathil
2013 arXiv   pre-print
Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.  ...  Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator.  ...  Empirical Algorithms for Dynamic Programming We now present empirical variants of dynamic programming algorithms. Our focus will be on value and policy iteration.  ... 
arXiv:1311.5918v1 fatcat:c6f723vqdvgpheorj35cc54muy

How does a stochastic optimization/approximation algorithm adapt to a randomly evolving optimum/root with jump Markov sample paths

G. Yin, C. Ion, V. Krishnamurthy
2007 Mathematical programming  
Using stochastic averaging, we prove convergence of the algorithm. Rate of convergence of the algorithm is obtained via bounds on the estimation errors and diffusion approximations.  ...  Remarks on improving the convergence rates through iterate averaging, and limit mean dynamics represented by differential inclusions are also presented.  ...  Due to the small step size used in the recursive computation of the sequence of iterates (parameter estimates), the stochastic optimization and approximation algorithms can be considered as a slow dynamical  ... 
doi:10.1007/s10107-007-0145-1 fatcat:6eqrggbeqrannlaifxxnox6aga

Stable LInear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions

Benjamin Van Roy, John N. Tsitsiklis
1995 Neural Information Processing Systems  
We consider the solution to large stochastic control problems by means of methods that rely on compact representations and a variant of the value iteration algorithm to compute approximate costto-go functions  ...  This class involves linear parameterizations of the cost-to-go function together with an assumption that the dynamic programming operator is a contraction with respect to the Euclidean norm when applied  ...  APPROXIMATIONS TO DYNAMIC PROGRAMMING Classical dynamic programming algorithms such as value iteration require that we maintain and update a vector V of dimension n.  ... 
dblp:conf/nips/RoyT95 fatcat:u2g7ry6nqzdirmivrogg2cysry

Empirical Dynamic Programming

William B. Haskell, Rahul Jain, Dileep Kalathil
2016 Mathematics of Operations Research  
Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.  ...  Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator.  ...  Empirical Algorithms for Dynamic Programming We now present empirical variants of dynamic programming algorithms. Our focus will be on value and policy iteration.  ... 
doi:10.1287/moor.2015.0733 fatcat:jjt6s3jbyvcxjlmhpzadbcsw5m

Page 2780 of Mathematical Reviews Vol. , Issue 87e [page]

1987 Mathematical Reviews  
87e:90121 Summary: “We report on the computational aspects of high level algorithms developed for efficiently processing the diverging and converging branch systems in nonserial dynamic programming.  ...  V. 87e:90121 Necessary and sufficient conditions of optimality for problems of linear dynamic programming.  ... 

A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications

Warren B. Powell, Jun Ma
2011 Journal of Control Theory and Applications  
We review the literature on approximate dynamic programming, with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued  ...  We then describe some recent research by the authors on approximate policy iteration algorithms that offer convergence guarantees (with technical assumptions) for both parametric and nonparametric architectures  ...  [49] considers a value-iteration-based approximate dynamic programming algorithm without knowledge of the internal dynamics of the system.  ... 
doi:10.1007/s11768-011-0313-y fatcat:ea6l7fzscjdbflgrft3b33b7ve

Efficient Parallelization of the Stochastic Dual Dynamic Programming Algorithm Applied to Hydropower Scheduling

Arild Helseth, Hallvard Braaten
2015 Energies  
Stochastic dual dynamic programming (SDDP) has become a popular algorithm used in practical long-term scheduling of hydropower systems.  ...  This paper presents a novel parallel scheme for the SDDP algorithm, where the stage-wise synchronization point traditionally used in the backward iteration of the SDDP algorithm is partially relaxed.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/en81212431 fatcat:rbk6szhr5fdqnd7ke6w74nqdiq

Suboptimality Bounds for Stochastic Shortest Path Problems [article]

Eric A. Hansen
2012 arXiv   pre-print
We consider how to use the Bellman residual of the dynamic programming operator to compute suboptimality bounds for solutions to stochastic shortest path problems.  ...  Such bounds have been previously established only in the special case that "all policies are proper," in which case the dynamic programming operator is known to be a contraction, and have been shown to  ...  algorithms: value iteration and policy iteration.  ... 
arXiv:1202.3729v1 fatcat:vf4q5afr6be7vf3flc2pgyd2jm

Page 1185 of Neural Computation Vol. 6, Issue 6 [page]

1994 Neural Computation  
Communicated by Steven Whitehead =———= On the Convergence of Stochastic Iterative Dynamic Programming Algorithms Tommi Jaakkola Michael I.  ...  These algorithms, including the TD()) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP).  ... 

Gradient-Bounded Dynamic Programming with Submodular and Concave Extensible Value Functions [article]

Denis Lebedev, Paul Goulart, Kostas Margellos
2020 arXiv   pre-print
For the case that the value function of the dynamic program is concave extensible and submodular in its state-space, we present a new algorithm that computes deterministic upper and stochastic lower bounds  ...  We then show that the proposed algorithm terminates after a finite number of iterations.  ...  ACKNOWLEDGEMENTS We gratefully acknowledge the helpful discussions with Michael Garstka, Department of Engineering Science, University of Oxford, on the Julia implementation of our Algorithm.  ... 
arXiv:2005.11213v1 fatcat:mrbqw3yz5ngkdlb4jqufmgf33y

New prioritized value iteration for Markov decision processes

Ma. de Guadalupe Garcia-Hernandez, Jose Ruiz-Pinales, Eva Onaindia, J. Gabriel Aviña-Cervantes, Sergio Ledesma-Orozco, Edgar Alvarado-Mendez, Alberto Reyes-Ballesteros
2011 Artificial Intelligence Review  
Here, we propose an improved value iteration algorithm based on Dijkstra's algorithm for solving shortest path Markov decision processes.  ...  For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations.  ...  Value iteration is a dynamic programming algorithm (Bellman 1957) for solving MDPs, but it is usually not considered because of its slow convergence (Littman 1995) .  ... 
doi:10.1007/s10462-011-9224-z fatcat:jteuazrrpnep7lvn4eagqbse7m
« Previous Showing results 1 — 15 out of 72,677 results