# Residual Policy Learning
**arXiv:** 1812.06298
**Authors:** Tom Silver, Kelsey Allen, Josh Tenenbaum, Leslie Kaelbling
**Fetched:** 2026-02-13
**Type:** Research Paper (Seminal Work)

---

**Note:** There are two concurrent seminal papers on residual policy/reinforcement learning (both December 2018). This file archives the Silver et al. paper. The companion paper is "Residual Reinforcement Learning for Robot Control" by Johannink et al. (arXiv 1812.03201, ICRA 2019), which focuses more on real-robot experiments. Both are foundational to the concept.

## Abstract

We present Residual Policy Learning (RPL), a simple method for improving nondifferentiable policies using model-free deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch is data-inefficient, and the initial controller can be hard to improve. The key idea is to learn a residual on top of the initial controller. The method is tested across six challenging MuJoCo environments featuring partial observability, noise, model inaccuracies, and calibration issues.

## Key Contributions

- **Residual learning paradigm:** Instead of learning policies from scratch, learns a corrective residual on top of an existing (imperfect) controller — combining the strengths of classical control with deep RL
- **Data efficiency:** Addresses the data inefficiency problem inherent in learning robotic manipulation from scratch by bootstrapping from existing controllers
- **Consistent improvements:** Substantially outperforms initial controllers across all tested scenarios
- **Flexible initialization:** Works with both hand-crafted policies and model-predictive controllers using known or learned dynamics models
- **Enabling long-horizon tasks:** The hybrid approach enables solving long-horizon sparse-reward tasks that RL alone cannot solve
- **Complementary strengths:** Demonstrates that marrying learning with classical control extends capabilities beyond what either approach achieves independently

## Companion Paper: Residual Reinforcement Learning for Robot Control

**arXiv:** 1812.03201
**Authors:** Tobias Johannink, Shikhar Bahl, Ashvin Nair, Jianlan Luo, Avinash Kumar, Matthias Loskyll, Juan Aparicio Ojea, Eugen Solowjow, Sergey Levine
**Venue:** ICRA 2019

This concurrent work independently proposes the same core idea of learning residual corrections to existing controllers, but with a focus on real-robot manipulation experiments. Together, the two papers established residual policy learning as a key technique in robot learning.

## G1 Relevance

Residual policy learning is highly relevant to the Unitree G1 as a practical approach for improving existing controllers. Rather than training whole-body control policies entirely from scratch (which is data-hungry and risky), one can start with a conventional controller (e.g., model-based walking controller, inverse kinematics solver, or PD controller) and learn a residual correction policy via RL. This is particularly valuable for the G1 because:

- Unitree provides baseline controllers that could serve as the initial policy
- The approach reduces sim-to-real gap issues since the base controller already handles fundamental dynamics
- It allows incremental improvement of locomotion and manipulation without discarding existing engineering effort
- Many G1 whole-body control papers (including SoFTA, H2O) implicitly build on this concept of combining learned and classical control

## References

- arXiv (Silver et al.): https://arxiv.org/abs/1812.06298
- arXiv (Johannink et al.): https://arxiv.org/abs/1812.03201