Dual Policy-Based TD-Learning for Model Predictive Control.

In particular, we propose to learn this penalty alongside the critic with dual TD-learning, a new procedure to estimate and minimize the magnitude of the target returns bias with trivial computational ... Off-policy deep reinforcement learning algorithms commonly compensate for overestimation bias during temporal-difference learning by utilizing pessimistic estimates of the expected target returns. ... We learn β using dual TD-learning with the same optimizer used for adjusting the value of α. Baseline results. ...

arXiv:2110.03375v2 fatcat:qd27c3czcre2tm7klnixhbsypi

Multiple Versions

Acknowledgments We thank Raju Bapi, Hiroaki Gomi, Okihide Hikosaka, Hiroshi Imamizu, Jun Morimoto, Hiroyuki Nakahara, and Kazuyuki Samejima for their collaboration on this article. ... Thus, in the TD learning framework, the TD error δ( ) t plays the dual role of the teaching signal for reward prediction (V) and action selection (Q). ... This is consistent with the TD learning model where learning is based on the TD error δ, as in (3) and (4) . ...

doi:10.1109/37.939943 fatcat:htguzzrusjb3neuztxk4kv54su

We revisit residual algorithms in both model-free and model-based reinforcement learning settings. ... Compared with the existing TD(k) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost. ... Besides RL, learned models are also used in other control methods, e.g., model predictive control (MPC, [17] ). Nagabandi et al. [35] learn deterministic models via neural networks for MPC. ...

arXiv:1905.01072v3 fatcat:x46i7xwbgbdnxdembxejerii3u

Multiple Versions

Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions. ... This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints. ... In addition, thanks to the MPC-based approximation, insights into the policy are provided by the physical meaning of the learned components, e.g., the prediction model and constraints. ...

arXiv:2312.05166v3 fatcat:uwgxxi636rbehhjmmeqdtzzw6y

Open Access Multiple Versions

In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). ... From an empirical aspect, the performance of different RL algorithms was evaluated and compared in several benchmark learning prediction and learning control tasks. ... RL algorithms with function approximation for learning prediction In RL, there are two basic tasks. One is called learning prediction and the other is called learning control. ...

doi:10.1016/j.ins.2013.08.037 fatcat:ki77nykp6rabdmq2jxk3zvwlpm

Reinforcement Learning (RL) has demonstrated a huge potential in learning optimal policies without any prior knowledge of the process to be controlled. ... Model Predictive Control (MPC) is a popular control technique which is able to deal with nonlinear dynamics and state and input constraints. ... Model Predictive Control (MPC) is a model-based technique which exploits a model of the system dynamics to predict the system's future behavior and optimize a given performance index, possibly subject ...

arXiv:1904.04614v1 fatcat:r7boibxqhrga7pynxva4zbzodi

The fundamental premise of sequential decision making for skill learning is based on interacting model-based (MB) and model-free (MF) reinforcement learning (RL) processes. ... In this context, for a discrete sequence production (DSP) task, one of the most insightful models is Verwey's Dual Processor Model (DPM). ... For an optimal state-action-reward policy, model-based learning guides the model-free learning at periodic intervals that ultimately leads to the goal state. ...

arXiv:1901.01856v1 fatcat:efgspkufxfbxhmr6cppit3xf6y

Open Access

In addition, we use supervised learning to induce a transition model for accurately predicting the next touchdown locations that the controller can achieve given the robot's proprioceptive observations ... This model paves the way for integrating the learned controller into a full-order robot locomotion planner that robustly satisfies both balance and environmental constraints. ... ACKNOWLEDGMENTS We thank Intel for providing vLab resources and students at Dynamic Robotics Laboratory for helpful discussions. ...

arXiv:2203.07589v2 fatcat:joln2ptxbfdxff7du4knqw2y2q

Multiple Versions

Integrating the temporal-difference (TD) method and the general advantage estimator (GAE), we propose the dual-track advantage estimator (DTAE) to accelerate the convergence of value functions and further ... Based on this principle, in this paper, we soften the proximal policy optimization by introducing the entropy and dynamically setting the temperature coefficient to balance the opportunity of exploration ... We can divide them into two categories: model-based or model-free RL. In model-based RL, we should learn not only the policy but the model in the optimization. ...

arXiv:2009.06858v1 fatcat:qdgrkuaaxne33ebbwtuu5noaha

to select learning policy. ... While the first is largely a meta-level symbolic approach, the second is more a class of problems employing a policy-based approach to learning in non-deterministic dynamic environments; the third is based ... Learning regular sets from queries and examples, ...

doi:10.1515/jisys.1997.7.3-4.205 fatcat:stjvzotm7renhnty3ktfhzvvgm

DOAJ

In this paper, we present a novel dual-system motor learning approach where a meta-controller arbitrates online between model-based and model-free decisions based on an estimate of the local reliability ... However, dual-system approaches fail to consider the reliability of the learned model when it is applied to make multiple-step predictions, resulting in a compounding of prediction errors and performance ... Dual-System Deep RL for Robot Control. Deep RL approaches are broadly classified into model-based and model-free ones. ...

arXiv:2004.08830v2 fatcat:zk5xb2szdbafbhvpdgbeftbwwu

Multiple Versions

Reinforcement learning is an efficient learning method for the control problem by interacting with the environment to get an optimal policy. ... Moreover, conventional reinforcement learning algorithms could hardly solve continuous control problems. ... It updates the model by estimation based on part of learning rather than final results of the learning. ...

doi:10.2298/csis170107029z fatcat:uywbpxyijzgkbdmmooe3hxc7ea

DOAJ Szczepanski

In computational terms, the latter is prominently associated with model-free reinforcement learning algorithms such as temporaldifference learning, and the former with model-based approaches. ... and bias of transition selection for prioritized value sweeping. ... (in this case, model-based) control. ...

doi:10.1007/978-1-4614-0751-5_5 fatcat:bvti3wyfanhnpmmkd7mh2szaze

Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. ... Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update. ... A RL problem may be formulated as a prediction, control or planning problem, and solution methods may be model-free or model-based, with value function and/or policy. ...

arXiv:1701.07274v6 fatcat:x2es3yf3crhqblbbskhxelxf2q

Multiple Versions

This paper presents a Reinforcement Learning-based dynamic power management framework for extending the battery service lifetime of a system with multiple active modes. ... The proposed algorithms are experimented on both single and dual-battery powered systems and the obtained results confirm their excellent performance for defining the optimal power management policy particularly ... For learning the optimal timeout value, since the system functions in an event driven manner, continuous-time learning based on TD(λ) learning method for SMDP is used. ...

doi:10.9790/1676-1203023446 fatcat:diafsqs36rhz5f6wx64ng5qsk4

Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning [article]

Preserved Fulltext

Other Versions

Neural mechanisms of learning and control

Preserved Fulltext

Deep Residual Reinforcement Learning [article]

Preserved Fulltext

Other Versions

Multi-Agent Reinforcement Learning via Distributed MPC as a Function Approximator [article]

Preserved Fulltext

Other Versions

Reinforcement learning algorithms with function approximation: Recent advances and applications

Preserved Fulltext

Practical Reinforcement Learning of Stabilizing Economic MPC [article]

Preserved Fulltext

A Computational Framework for Motor Skill Acquisition [article]

Preserved Fulltext

Sim-to-Real Learning of Footstep-Constrained Bipedal Dynamic Walking [article]

Preserved Fulltext

Other Versions

Soft policy optimization using dual-track advantage estimator [article]

Preserved Fulltext

Active Learning: Approaches and Issues

Preserved Fulltext

Improving Robot Dual-System Motor Learning with Intrinsically Motivated Meta-Control and Latent-Space Experience Imagination [article]

Preserved Fulltext

Other Versions

A kernel based true online Sarsa(λ) for continuous space control problems

Preserved Fulltext

Dual-System Learning Models and Drugs of Abuse [chapter]

Preserved Fulltext

Deep Reinforcement Learning: An Overview [article]

Preserved Fulltext

Other Versions

Extending the Service Lifetime of a Battery-Powered System Supporting Multiple Active Modes

Preserved Fulltext