Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Algorithms for sequential decision-making
Publisher:
  • Brown University
  • Department of Computer Science Box 1910 Providence, RI
  • United States
ISBN:978-0-591-16350-6
Order Number:AAI9709069
Pages:
263
Bibliometrics
Skip Abstract Section
Abstract

Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a long-run measure of reward, and "I" is an automated planning or learning system (agent). In particular, I collect basic results concerning methods for finding optimal (or near-optimal) behavior in several different kinds of model environments: Markov decision processes, in which the agent always knows its state; partially observable Markov decision processes (scPOMDPS), in which the agent must piece together its state on the basis of observations it makes; and Markov games, in which the agent is in direct competition with an opponent. The thesis is written from a computer-science perspective, meaning that many mathematical details are not discussed, and descriptions of algorithms and the complexity of problems are emphasized. New results include an improved algorithm for solving scPOMDPS exactly over finite horizons, a method for learning minimax-optimal policies for Markov games, a pseudopolynomial bound for policy iteration, and a complete complexity theory for finding zero-reward scPOMDP policies.

Cited By

  1. Zhang J, Zheng Y, Zhang C, Zhao L, Song L, Zhou Y and Bian J Robust situational reinforcement learning in face of context disturbances Proceedings of the 40th International Conference on Machine Learning, (41973-41989)
  2. Huang D, Xu D, Zhu Y, Garg A, Savarese S, Fei-Fei L and Niebles J Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (2635-2642)
  3. Sridharan M, Gelfond M, Zhang S and Wyatt J (2019). REBA, Journal of Artificial Intelligence Research, 65:1, (87-180), Online publication date: 1-May-2019.
  4. Chatterjee K, Elgyütt A, Novotny P and Rouillé O Expectation optimization with probabilistic guarantees in POMDPs with discounted-sum objectives Proceedings of the 27th International Joint Conference on Artificial Intelligence, (4692-4699)
  5. Horák K, Bošansky B and Chatterjee K Goal-HSVI Proceedings of the 27th International Joint Conference on Artificial Intelligence, (4764-4770)
  6. Dezhabad N and Sharifian S (2018). Learning-based dynamic scalable load-balanced firewall as a service in network function-virtualized cloud computing environments, The Journal of Supercomputing, 74:7, (3329-3358), Online publication date: 1-Jul-2018.
  7. Chen Y, Kochenderfer M and J. Spaan M Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (3531-3536)
  8. Boros E, Elbassioni K, Fouz M, Gurvich V, Makino K and Manthey B (2018). Approximation Schemes for Stochastic Mean Payoff Games with Perfect Information and Few Random Positions, Algorithmica, 80:11, (3132-3157), Online publication date: 1-Nov-2018.
  9. Zhang T, Xie S and Rose O Real-time job shop scheduling based on simulation and markov decision processes Proceedings of the 2017 Winter Simulation Conference, (1-9)
  10. Nachum O, Norouzi M, Xu K and Schuurmans D Bridging the gap between value and policy based reinforcement learning Proceedings of the 31st International Conference on Neural Information Processing Systems, (2772-2782)
  11. Walraven E and Spaan M Accelerated vector pruning for optimal POMDP solvers Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (3672-3678)
  12. Chatterjee K, Novotný P, Perez G, Raskin J and Zikelic D Optimizing expectation with guarantees in POMDPs Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (3725-3732)
  13. Asadi K and Littman M An alternative softmax operator for reinforcement learning Proceedings of the 34th International Conference on Machine Learning - Volume 70, (243-252)
  14. Bharadwaj S, Le Roux S, Pérez G and Topcu U Reduction techniques for model checking and learning in MDPs Proceedings of the 26th International Joint Conference on Artificial Intelligence, (4273-4279)
  15. Whitney D, Rosen E, MacGlashan J, Wong L and Tellex S Reducing errors in object-fetching interactions through social feedback 2017 IEEE International Conference on Robotics and Automation (ICRA), (1006-1013)
  16. Sun J (2016). Marginal quality-based long-term incentive mechanisms for crowd sensing, International Journal of Communication Systems, 29:5, (942-958), Online publication date: 25-Mar-2016.
  17. Grzes M and Poupart P POMDP planning and execution in an augmented space Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, (757-764)
  18. ACM
    Hansen T, Miltersen P and Zwick U (2013). Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor, Journal of the ACM, 60:1, (1-16), Online publication date: 1-Feb-2013.
  19. ACM
    Simari G, Dickerson J, Sliva A and Subrahmanian V (2013). Parallel Abductive Query Answering in Probabilistic Logic Programs, ACM Transactions on Computational Logic, 14:2, (1-39), Online publication date: 1-Jun-2013.
  20. ACM
    Baier C, Grösser M and Bertrand N (2012). Probabilistic ω-automata, Journal of the ACM, 59:1, (1-52), Online publication date: 1-Feb-2012.
  21. Reddi S and Brunskill E Incentive decision processes Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, (418-427)
  22. ACM
    Vassev E, Hinchey M, Gaudin B and Nixon P Requirements and initial model for KnowLang Proceedings of The Fourth International C* Conference on Computer Science and Software Engineering, (35-42)
  23. Bigus J, Campbell M, Carmeli B, Cefkin M, Chang H, Chen-Ritzo C, Cody W, Ebadollahi S, Evfimievski A, Farkash A, Glissmann S, Gotz D, Grandison T, Gruhl D, Haas P, Hsiao M, Hsueh P, Hu J, Jasinski J, Kaufman J, Kieliszewski C, Kohn M, Knoop S, Maglio P, Mak R, Nelken H, Neti C, Neuvirth H, Pan Y, Peres Y, Ramakrishnan S, Rosen-Zvi M, Renly S, Selinger P, Shabo A, Sorrentino R, Sun J, Syeda-Mahmood T, Tan W, Tao Y, Yaesoubi R and Zhu X (2011). Information technology for healthcare transformation, IBM Journal of Research and Development, 55:5, (492-505), Online publication date: 1-Sep-2011.
  24. ACM
    Madani O, Thorup M and Zwick U (2010). Discounted deterministic Markov decision processes and discounted all-pairs shortest paths, ACM Transactions on Algorithms (TALG), 6:2, (1-25), Online publication date: 1-Mar-2010.
  25. Besse C and Chaib-draa B Quasi deterministic POMDPs and DecPOMDPs Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (1393-1394)
  26. Fern A and Tadepalli P A computational decision theory for interactive assistants Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, (577-585)
  27. Boros E, Elbassioni K, Gurvich V and Makino K A pumping algorithm for ergodic stochastic mean payoff games with perfect information Proceedings of the 14th international conference on Integer Programming and Combinatorial Optimization, (341-354)
  28. Sharma R and Gopal M (2010). Review article, Applied Soft Computing, 10:3, (675-688), Online publication date: 1-Jun-2010.
  29. Chatterjee K, Doyen L and Henzinger T Qualitative analysis of partially-observable Markov decision processes Proceedings of the 35th international conference on Mathematical foundations of computer science, (258-269)
  30. Simari G, Dickerson J and Subrahmanian V Cost-based query answering in action probabilistic logic programs Proceedings of the 4th international conference on Scalable uncertainty management, (319-332)
  31. Baier C On model checking techniques for randomized distributed systems Proceedings of the 8th international conference on Integrated formal methods, (1-11)
  32. Walsh T, Nouri A, Li L and Littman M (2009). Learning and planning in environments with delayed feedback, Autonomous Agents and Multi-Agent Systems, 18:1, (83-105), Online publication date: 1-Feb-2009.
  33. Madani O, Thorup M and Zwick U Discounted deterministic Markov decision processes and discounted all-pairs shortest paths Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, (958-967)
  34. Bonet B Deterministic POMDPs revisited Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, (59-66)
  35. Andersson D and Miltersen P The Complexity of Solving Stochastic Games on Graphs Proceedings of the 20th International Symposium on Algorithms and Computation, (112-121)
  36. Hajishirzi H, Shirazi A, Choi J and Amir E Greedy algorithms for sequential sensing decisions Proceedings of the 21st International Joint Conference on Artificial Intelligence, (1908-1915)
  37. Adam A, Rabinovich Z and Rosenschein J Dynamics based control with PSRs Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1, (387-394)
  38. An X, Xiang Y and Cercone N (2008). Dynamic multiagent probabilistic inference, International Journal of Approximate Reasoning, 48:1, (185-213), Online publication date: 1-Apr-2008.
  39. Ross S, Pineau J, Paquet S and Chaib-draa B (2008). Online planning algorithms for POMDPs, Journal of Artificial Intelligence Research, 32:1, (663-704), Online publication date: 1-May-2008.
  40. Ross S and Chaib-Draa B AEMS Proceedings of the 20th international joint conference on Artifical intelligence, (2592-2598)
  41. Walsh T, Nouri A, Li L and Littman M Planning and Learning in Environments with Delayed Feedback Proceedings of the 18th European conference on Machine Learning, (442-453)
  42. Shahaf D and Amir E Learning partially observable action schemas Proceedings of the 21st national conference on Artificial intelligence - Volume 1, (913-919)
  43. Chatterjee K, Doyen L, Henzinger T and Raskin J Algorithms for omega-regular games with imperfect information Proceedings of the 20th international conference on Computer Science Logic, (287-302)
  44. Pineau J, Gordon G and Thrun S (2006). Anytime point-based approximations for large POMDPs, Journal of Artificial Intelligence Research, 27:1, (335-380), Online publication date: 1-Sep-2006.
  45. ACM
    Paquet S, Tobin L and Chaib-draa B An online POMDP algorithm for complex multiagent environments Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, (970-977)
  46. Amir E Learning partially observable deterministic action models Proceedings of the 19th international joint conference on Artificial intelligence, (1433-1439)
  47. Gosavi A (2019). A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward, Machine Language, 55:1, (5-29), Online publication date: 1-Apr-2004.
  48. Izadi M and Precup D A planning algorithm for predictive state representations Proceedings of the 18th international joint conference on Artificial intelligence, (1520-1521)
  49. Madani O, Hanks S and Condon A (2003). On the undecidability of probabilistic planning and related stochastic optimization problems, Artificial Intelligence, 147:1-2, (5-34), Online publication date: 1-Jul-2003.
  50. Lörincz A, Pólik I and Szita I (2003). Event-learning and robust policy heuristics, Cognitive Systems Research, 4:4, (319-337), Online publication date: 1-Dec-2003.
  51. ACM
    Chades I, Scherrer B and Charpillet F A heuristic approach for solving decentralized-POMDP Proceedings of the 2002 ACM symposium on Applied computing, (57-62)
  52. Madani O On policy iteration as a Newton's method and polynomial policy iteration algorithms Eighteenth national conference on Artificial intelligence, (273-278)
  53. Madani O Polynomial value iteration algorithms for deterministic MDPs Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, (311-318)
  54. Shelton C Reinforcement learning with partially known world dynamics Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, (461-468)
  55. ACM
    Mansour Y Reinforcement learning and mistake bounded algorithms Proceedings of the twelfth annual conference on Computational learning theory, (183-192)
  56. Mansour Y and Singh S On the complexity of policy iteration Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, (401-408)
  57. McAllester D and Singh S Approximate planning for factored POMDPs using belief state simplification Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, (409-416)
  58. Szepesvári C and Littman M (1999). A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms, Neural Computation, 11:8, (2017-2060), Online publication date: 1-Nov-1999.
  59. Pollack J and Blair A (1998). Co-Evolution in the Successful Learning of Backgammon Strategy, Machine Language, 32:3, (225-240), Online publication date: 1-Sep-1998.
  60. Kalmár Z, Szepesvári C and Lörincz A (1998). Module-Based Reinforcement Learning, Machine Language, 31:1-3, (55-85), Online publication date: 1-Apr-1998.
  61. Crites R and Barto A (2019). Elevator Group Control Using Multiple Reinforcement Learning Agents, Machine Language, 33:2-3, (235-262), Online publication date: 1-Dec-1998.
  62. Sheppard J (2019). Colearning in Differential Games, Machine Language, 33:2-3, (201-233), Online publication date: 1-Dec-1998.
  63. Kalmár Z, Szepesvári C and Lőrincz A (1998). Module-Based Reinforcement Learning, Autonomous Robots, 5:3-4, (273-295), Online publication date: 1-Jul-1998.
Contributors
  • Brown University

Recommendations