Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








3,833 Hits in 3.6 sec

Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning [article]

Edoardo Cetin, Oya Celiktutan
2023 arXiv   pre-print
In particular, we propose to learn this penalty alongside the critic with dual TD-learning, a new procedure to estimate and minimize the magnitude of the target returns bias with trivial computational  ...  Off-policy deep reinforcement learning algorithms commonly compensate for overestimation bias during temporal-difference learning by utilizing pessimistic estimates of the expected target returns.  ...  We learn β using dual TD-learning with the same optimizer used for adjusting the value of α. Baseline results.  ... 
arXiv:2110.03375v2 fatcat:qd27c3czcre2tm7klnixhbsypi

Neural mechanisms of learning and control

2001 IEEE Control Systems  
Acknowledgments We thank Raju Bapi, Hiroaki Gomi, Okihide Hikosaka, Hiroshi Imamizu, Jun Morimoto, Hiroyuki Nakahara, and Kazuyuki Samejima for their collaboration on this article.  ...  Thus, in the TD learning framework, the TD error δ( ) t plays the dual role of the teaching signal for reward prediction (V) and action selection (Q).  ...  This is consistent with the TD learning model where learning is based on the TD error δ, as in (3) and (4) .  ... 
doi:10.1109/37.939943 fatcat:htguzzrusjb3neuztxk4kv54su

Deep Residual Reinforcement Learning [article]

Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson
2020 arXiv   pre-print
We revisit residual algorithms in both model-free and model-based reinforcement learning settings.  ...  Compared with the existing TD(k) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.  ...  Besides RL, learned models are also used in other control methods, e.g., model predictive control (MPC, [17] ). Nagabandi et al. [35] learn deterministic models via neural networks for MPC.  ... 
arXiv:1905.01072v3 fatcat:x46i7xwbgbdnxdembxejerii3u

Multi-Agent Reinforcement Learning via Distributed MPC as a Function Approximator [article]

Samuel Mallick, Filippo Airaldi, Azita Dabiri, Bart De Schutter
2024 arXiv   pre-print
Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions.  ...  This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints.  ...  In addition, thanks to the MPC-based approximation, insights into the policy are provided by the physical meaning of the learned components, e.g., the prediction model and constraints.  ... 
arXiv:2312.05166v3 fatcat:uwgxxi636rbehhjmmeqdtzzw6y

Reinforcement learning algorithms with function approximation: Recent advances and applications

Xin Xu, Lei Zuo, Zhenhua Huang
2014 Information Sciences  
In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs).  ...  From an empirical aspect, the performance of different RL algorithms was evaluated and compared in several benchmark learning prediction and learning control tasks.  ...  RL algorithms with function approximation for learning prediction In RL, there are two basic tasks. One is called learning prediction and the other is called learning control.  ... 
doi:10.1016/j.ins.2013.08.037 fatcat:ki77nykp6rabdmq2jxk3zvwlpm

Practical Reinforcement Learning of Stabilizing Economic MPC [article]

Mario Zanon, Sébastien Gros, Alberto Bemporad
2019 arXiv   pre-print
Reinforcement Learning (RL) has demonstrated a huge potential in learning optimal policies without any prior knowledge of the process to be controlled.  ...  Model Predictive Control (MPC) is a popular control technique which is able to deal with nonlinear dynamics and state and input constraints.  ...  Model Predictive Control (MPC) is a model-based technique which exploits a model of the system dynamics to predict the system's future behavior and optimize a given performance index, possibly subject  ... 
arXiv:1904.04614v1 fatcat:r7boibxqhrga7pynxva4zbzodi

A Computational Framework for Motor Skill Acquisition [article]

Krishn Bera, Tejas Savalia, Bapi Raju
2019 arXiv   pre-print
The fundamental premise of sequential decision making for skill learning is based on interacting model-based (MB) and model-free (MF) reinforcement learning (RL) processes.  ...  In this context, for a discrete sequence production (DSP) task, one of the most insightful models is Verwey's Dual Processor Model (DPM).  ...  For an optimal state-action-reward policy, model-based learning guides the model-free learning at periodic intervals that ultimately leads to the goal state.  ... 
arXiv:1901.01856v1 fatcat:efgspkufxfbxhmr6cppit3xf6y

Sim-to-Real Learning of Footstep-Constrained Bipedal Dynamic Walking [article]

Helei Duan, Ashish Malik, Jeremy Dao, Aseem Saxena, Kevin Green, Jonah Siekmann, Alan Fern, Jonathan Hurst
2022 arXiv   pre-print
In addition, we use supervised learning to induce a transition model for accurately predicting the next touchdown locations that the controller can achieve given the robot's proprioceptive observations  ...  This model paves the way for integrating the learned controller into a full-order robot locomotion planner that robustly satisfies both balance and environmental constraints.  ...  ACKNOWLEDGMENTS We thank Intel for providing vLab resources and students at Dynamic Robotics Laboratory for helpful discussions.  ... 
arXiv:2203.07589v2 fatcat:joln2ptxbfdxff7du4knqw2y2q

Soft policy optimization using dual-track advantage estimator [article]

Yubo Huang, Xuechun Wang, Luobao Zou, Zhiwei Zhuang, Weidong Zhang
2020 arXiv   pre-print
Integrating the temporal-difference (TD) method and the general advantage estimator (GAE), we propose the dual-track advantage estimator (DTAE) to accelerate the convergence of value functions and further  ...  Based on this principle, in this paper, we soften the proximal policy optimization by introducing the entropy and dynamically setting the temperature coefficient to balance the opportunity of exploration  ...  We can divide them into two categories: model-based or model-free RL. In model-based RL, we should learn not only the policy but the model in the optimization.  ... 
arXiv:2009.06858v1 fatcat:qdgrkuaaxne33ebbwtuu5noaha

Active Learning: Approaches and Issues

T.R. Chaudhur, L.G.C. Hamey
1997 Journal of Intelligent Systems  
to select learning policy.  ...  While the first is largely a meta-level symbolic approach, the second is more a class of problems employing a policy-based approach to learning in non-deterministic dynamic environments; the third is based  ...  Learning regular sets from queries and examples,  ... 
doi:10.1515/jisys.1997.7.3-4.205 fatcat:stjvzotm7renhnty3ktfhzvvgm

Improving Robot Dual-System Motor Learning with Intrinsically Motivated Meta-Control and Latent-Space Experience Imagination [article]

Muhammad Burhan Hafez, Cornelius Weber, Matthias Kerzel, Stefan Wermter
2020 arXiv   pre-print
In this paper, we present a novel dual-system motor learning approach where a meta-controller arbitrates online between model-based and model-free decisions based on an estimate of the local reliability  ...  However, dual-system approaches fail to consider the reliability of the learned model when it is applied to make multiple-step predictions, resulting in a compounding of prediction errors and performance  ...  Dual-System Deep RL for Robot Control. Deep RL approaches are broadly classified into model-based and model-free ones.  ... 
arXiv:2004.08830v2 fatcat:zk5xb2szdbafbhvpdgbeftbwwu

A kernel based true online Sarsa(λ) for continuous space control problems

Fei Zhu, Haijun Zhu, Yuchen Fu, Donghuo Chen, Xiaoke Zhou
2017 Computer Science and Information Systems  
Reinforcement learning is an efficient learning method for the control problem by interacting with the environment to get an optimal policy.  ...  Moreover, conventional reinforcement learning algorithms could hardly solve continuous control problems.  ...  It updates the model by estimation based on part of learning rather than final results of the learning.  ... 
doi:10.2298/csis170107029z fatcat:uywbpxyijzgkbdmmooe3hxc7ea

Dual-System Learning Models and Drugs of Abuse [chapter]

Dylan A. Simon, Nathaniel D. Daw
2012 Computational Neuroscience of Drug Addiction  
In computational terms, the latter is prominently associated with model-free reinforcement learning algorithms such as temporaldifference learning, and the former with model-based approaches.  ...  and bias of transition selection for prioritized value sweeping.  ...  (in this case, model-based) control.  ... 
doi:10.1007/978-1-4614-0751-5_5 fatcat:bvti3wyfanhnpmmkd7mh2szaze

Deep Reinforcement Learning: An Overview [article]

Yuxi Li
2018 arXiv   pre-print
Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration.  ...  Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update.  ...  A RL problem may be formulated as a prediction, control or planning problem, and solution methods may be model-free or model-based, with value function and/or policy.  ... 
arXiv:1701.07274v6 fatcat:x2es3yf3crhqblbbskhxelxf2q

Extending the Service Lifetime of a Battery-Powered System Supporting Multiple Active Modes

Maryam Triki
2017 IOSR Journal of Electrical and Electronics Engineering  
This paper presents a Reinforcement Learning-based dynamic power management framework for extending the battery service lifetime of a system with multiple active modes.  ...  The proposed algorithms are experimented on both single and dual-battery powered systems and the obtained results confirm their excellent performance for defining the optimal power management policy particularly  ...  For learning the optimal timeout value, since the system functions in an event driven manner, continuous-time learning based on TD(λ) learning method for SMDP is used.  ... 
doi:10.9790/1676-1203023446 fatcat:diafsqs36rhz5f6wx64ng5qsk4
« Previous Showing results 1 — 15 out of 3,833 results