A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2023; you can also visit the original URL.
The file type is application/pdf
.
Filters
Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning
[article]
2023
arXiv
pre-print
In particular, we propose to learn this penalty alongside the critic with dual TD-learning, a new procedure to estimate and minimize the magnitude of the target returns bias with trivial computational ...
Off-policy deep reinforcement learning algorithms commonly compensate for overestimation bias during temporal-difference learning by utilizing pessimistic estimates of the expected target returns. ...
We learn β using dual TD-learning with the same optimizer used for adjusting the value of α. Baseline results. ...
arXiv:2110.03375v2
fatcat:qd27c3czcre2tm7klnixhbsypi
Neural mechanisms of learning and control
2001
IEEE Control Systems
Acknowledgments We thank Raju Bapi, Hiroaki Gomi, Okihide Hikosaka, Hiroshi Imamizu, Jun Morimoto, Hiroyuki Nakahara, and Kazuyuki Samejima for their collaboration on this article. ...
Thus, in the TD learning framework, the TD error δ( ) t plays the dual role of the teaching signal for reward prediction (V) and action selection (Q). ...
This is consistent with the TD learning model where learning is based on the TD error δ, as in (3) and (4) . ...
doi:10.1109/37.939943
fatcat:htguzzrusjb3neuztxk4kv54su
Deep Residual Reinforcement Learning
[article]
2020
arXiv
pre-print
We revisit residual algorithms in both model-free and model-based reinforcement learning settings. ...
Compared with the existing TD(k) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost. ...
Besides RL, learned models are also used in other control methods, e.g., model predictive control (MPC, [17] ). Nagabandi et al. [35] learn deterministic models via neural networks for MPC. ...
arXiv:1905.01072v3
fatcat:x46i7xwbgbdnxdembxejerii3u
Multi-Agent Reinforcement Learning via Distributed MPC as a Function Approximator
[article]
2024
arXiv
pre-print
Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions. ...
This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints. ...
In addition, thanks to the MPC-based approximation, insights into the policy are provided by the physical meaning of the learned components, e.g., the prediction model and constraints. ...
arXiv:2312.05166v3
fatcat:uwgxxi636rbehhjmmeqdtzzw6y
Reinforcement learning algorithms with function approximation: Recent advances and applications
2014
Information Sciences
In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). ...
From an empirical aspect, the performance of different RL algorithms was evaluated and compared in several benchmark learning prediction and learning control tasks. ...
RL algorithms with function approximation for learning prediction In RL, there are two basic tasks. One is called learning prediction and the other is called learning control. ...
doi:10.1016/j.ins.2013.08.037
fatcat:ki77nykp6rabdmq2jxk3zvwlpm
Practical Reinforcement Learning of Stabilizing Economic MPC
[article]
2019
arXiv
pre-print
Reinforcement Learning (RL) has demonstrated a huge potential in learning optimal policies without any prior knowledge of the process to be controlled. ...
Model Predictive Control (MPC) is a popular control technique which is able to deal with nonlinear dynamics and state and input constraints. ...
Model Predictive Control (MPC) is a model-based technique which exploits a model of the system dynamics to predict the system's future behavior and optimize a given performance index, possibly subject ...
arXiv:1904.04614v1
fatcat:r7boibxqhrga7pynxva4zbzodi
A Computational Framework for Motor Skill Acquisition
[article]
2019
arXiv
pre-print
The fundamental premise of sequential decision making for skill learning is based on interacting model-based (MB) and model-free (MF) reinforcement learning (RL) processes. ...
In this context, for a discrete sequence production (DSP) task, one of the most insightful models is Verwey's Dual Processor Model (DPM). ...
For an optimal state-action-reward policy, model-based learning guides the model-free learning at periodic intervals that ultimately leads to the goal state. ...
arXiv:1901.01856v1
fatcat:efgspkufxfbxhmr6cppit3xf6y
Sim-to-Real Learning of Footstep-Constrained Bipedal Dynamic Walking
[article]
2022
arXiv
pre-print
In addition, we use supervised learning to induce a transition model for accurately predicting the next touchdown locations that the controller can achieve given the robot's proprioceptive observations ...
This model paves the way for integrating the learned controller into a full-order robot locomotion planner that robustly satisfies both balance and environmental constraints. ...
ACKNOWLEDGMENTS We thank Intel for providing vLab resources and students at Dynamic Robotics Laboratory for helpful discussions. ...
arXiv:2203.07589v2
fatcat:joln2ptxbfdxff7du4knqw2y2q
Soft policy optimization using dual-track advantage estimator
[article]
2020
arXiv
pre-print
Integrating the temporal-difference (TD) method and the general advantage estimator (GAE), we propose the dual-track advantage estimator (DTAE) to accelerate the convergence of value functions and further ...
Based on this principle, in this paper, we soften the proximal policy optimization by introducing the entropy and dynamically setting the temperature coefficient to balance the opportunity of exploration ...
We can divide them into two categories: model-based or model-free RL. In model-based RL, we should learn not only the policy but the model in the optimization. ...
arXiv:2009.06858v1
fatcat:qdgrkuaaxne33ebbwtuu5noaha
Active Learning: Approaches and Issues
1997
Journal of Intelligent Systems
to select learning policy. ...
While the first is largely a meta-level symbolic approach, the second is more a class of problems employing a policy-based approach to learning in non-deterministic dynamic environments; the third is based ...
Learning regular sets from queries and examples, ...
doi:10.1515/jisys.1997.7.3-4.205
fatcat:stjvzotm7renhnty3ktfhzvvgm
Improving Robot Dual-System Motor Learning with Intrinsically Motivated Meta-Control and Latent-Space Experience Imagination
[article]
2020
arXiv
pre-print
In this paper, we present a novel dual-system motor learning approach where a meta-controller arbitrates online between model-based and model-free decisions based on an estimate of the local reliability ...
However, dual-system approaches fail to consider the reliability of the learned model when it is applied to make multiple-step predictions, resulting in a compounding of prediction errors and performance ...
Dual-System Deep RL for Robot Control. Deep RL approaches are broadly classified into model-based and model-free ones. ...
arXiv:2004.08830v2
fatcat:zk5xb2szdbafbhvpdgbeftbwwu
A kernel based true online Sarsa(λ) for continuous space control problems
2017
Computer Science and Information Systems
Reinforcement learning is an efficient learning method for the control problem by interacting with the environment to get an optimal policy. ...
Moreover, conventional reinforcement learning algorithms could hardly solve continuous control problems. ...
It updates the model by estimation based on part of learning rather than final results of the learning. ...
doi:10.2298/csis170107029z
fatcat:uywbpxyijzgkbdmmooe3hxc7ea
Dual-System Learning Models and Drugs of Abuse
[chapter]
2012
Computational Neuroscience of Drug Addiction
In computational terms, the latter is prominently associated with model-free reinforcement learning algorithms such as temporaldifference learning, and the former with model-based approaches. ...
and bias of transition selection for prioritized value sweeping. ...
(in this case, model-based) control. ...
doi:10.1007/978-1-4614-0751-5_5
fatcat:bvti3wyfanhnpmmkd7mh2szaze
Deep Reinforcement Learning: An Overview
[article]
2018
arXiv
pre-print
Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. ...
Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update. ...
A RL problem may be formulated as a prediction, control or planning problem, and solution methods may be model-free or model-based, with value function and/or policy. ...
arXiv:1701.07274v6
fatcat:x2es3yf3crhqblbbskhxelxf2q
Extending the Service Lifetime of a Battery-Powered System Supporting Multiple Active Modes
2017
IOSR Journal of Electrical and Electronics Engineering
This paper presents a Reinforcement Learning-based dynamic power management framework for extending the battery service lifetime of a system with multiple active modes. ...
The proposed algorithms are experimented on both single and dual-battery powered systems and the obtained results confirm their excellent performance for defining the optimal power management policy particularly ...
For learning the optimal timeout value, since the system functions in an event driven manner, continuous-time learning based on TD(λ) learning method for SMDP is used. ...
doi:10.9790/1676-1203023446
fatcat:diafsqs36rhz5f6wx64ng5qsk4
« Previous
Showing results 1 — 15 out of 3,833 results