---
id: push-recovery-balance
title: "Push Recovery & Robust Balance"
status: established
source_sections: "reference/sources/paper-gait-conditioned-rl.md, reference/sources/paper-getting-up-policies.md, reference/sources/paper-safe-control-cluttered.md, reference/sources/paper-residual-policy.md, reference/sources/paper-cbf-humanoid.md"
related_topics: [locomotion-control, whole-body-control, safety-limits, equations-and-bounds, learning-and-ai, simulation]
key_equations: [com, zmp, inverse_dynamics]
key_terms: [push_recovery, ankle_strategy, hip_strategy, stepping_strategy, residual_policy, control_barrier_function, support_polygon, perturbation_curriculum]
images: []
examples: []
open_questions:
  - "What is the max recoverable push force for the stock G1 controller?"
  - "Does residual policy overlay work with the proprietary locomotion computer, or does it require full replacement?"
  - "What is the minimum viable sensor set for push detection (IMU only vs. IMU + F/T)?"
  - "What perturbation force ranges should be used in training curriculum?"
---

# Push Recovery & Robust Balance

Making the G1 robust to external pushes and maintaining balance during all activities — the "always-on" stability layer.

## 1. Push Recovery Strategies

When a humanoid is pushed, it can respond with progressively more aggressive strategies depending on perturbation magnitude: [T1 — Established biomechanics/robotics]

### Ankle Strategy (Small Perturbations)
- **Mechanism:** Ankle torque adjusts center of pressure (CoP) within the foot
- **Range:** Small pushes that don't move CoM outside the foot support area
- **Speed:** Fastest response (~50ms)
- **G1 applicability:** Yes — G1 has ankle pitch and roll joints [T1]
- **Limitation:** Only works for small perturbations; foot must remain flat

### Hip Strategy (Medium Perturbations)
- **Mechanism:** Rapid hip flexion/extension shifts CoM back over support
- **Range:** Pushes that exceed ankle authority but don't require stepping
- **Speed:** Medium (~100-200ms)
- **G1 applicability:** Yes — G1 hip has 3 DOF with ±154° pitch [T1]
- **Often combined with:** Upper body countermotion (arms swing opposite to push direction)

### Stepping Strategy (Large Perturbations)
- **Mechanism:** Take a recovery step to create a new support polygon under the shifted CoM
- **Range:** Large pushes where CoM exits current support polygon
- **Speed:** Slowest (~300-500ms, must plan and execute step)
- **G1 applicability:** Yes — requires whole-body coordination [T1]
- **Most complex:** Needs free space to step into, foot placement planning

### Combined/Learned Strategy
Modern RL-based controllers learn a blended strategy that seamlessly transitions between ankle, hip, and stepping responses based on perturbation magnitude. This is the approach used by the G1's stock controller and by research push-recovery policies. [T1]

## 2. What the Stock Controller Already Does

The G1's proprietary RL-based locomotion controller (running on the locomotion computer at 192.168.123.161) already handles basic push recovery: [T1]

- **Light push recovery** during standing and walking — confirmed in arXiv:2505.20619
- **Gait-conditioned policy** implicitly learns balance through training
- **500 Hz control loop** provides fast response to perturbations

However, the stock controller's push recovery limits are not documented. Key unknowns:
- Maximum recoverable impulse (N·s) during standing
- Maximum recoverable impulse during walking
- Whether it uses stepping recovery or only ankle/hip strategies
- How it performs when the upper body is doing something unexpected (e.g., mocap)

## 3. Enhancing Push Recovery

### 3a. Perturbation Curriculum Training

The most validated approach: train an RL policy in simulation with random external forces applied during training. [T1 — Multiple G1 papers]

```
Training Loop (in sim):
1. Run locomotion policy
2. At random intervals, apply external force to robot torso
   - Direction: random (forward, backward, lateral)
   - Magnitude: curriculum (start small, increase as policy improves)
   - Duration: 0.1-0.5s impulse
3. Reward: stay upright, track velocity command, minimize energy
4. Penalty: falling, excessive joint acceleration
```

**Key papers validated on G1:**

| Paper | Approach | Validated? | Key Finding |
|---|---|---|---|
| arXiv:2505.20619 | Gait-conditioned RL with perturbations | Yes (real G1) | Push robustness during walking |
| arXiv:2511.07407 | Unified fall prevention + mitigation + recovery | Yes (zero-shot) | Combined strategy from sparse demos |
| arXiv:2502.12152 | Two-stage recovery (supine + prone) | Yes (real G1) | Get-up after falling |
| arXiv:2502.08378 | HoST multi-critic RL | Yes (real G1) | Diverse posture recovery |

### Perturbation Curriculum Parameters (Typical)

| Parameter | Start | End | Notes |
|---|---|---|---|
| Max force (N) | 20 | 100-200 | Ramp over training |
| Force duration (s) | 0.1 | 0.5 | Short impulses to sustained pushes |
| Direction | Forward only | Omnidirectional | Add lateral/backward progressively |
| Frequency | Rare (every 10s) | Frequent (every 2s) | Increase as policy improves |
| Application point | Torso center | Random (torso, shoulders) | Vary to generalize |

[T2 — Ranges from research papers, not G1-specific tuning]

### 3b. Residual Policy Learning

Train a small "correction" policy that adds to the output of an existing base controller: [T1 — Established technique]

```
Base Controller Output (stock or trained):  a_base
Residual Policy Output (small corrections):  a_residual
Final Action:  a = a_base + α * a_residual     (α < 1 for safety)
```

**Why this matters for G1:**
- The stock locomotion controller is good but not customizable
- A residual policy can be trained on top of it to improve push recovery
- The scaling factor α limits how much the residual can deviate from base behavior
- This is the safest path to "enhanced" balance without replacing the stock controller

**Implementation on G1 (Approach A — Overlay):**
1. Read current lowstate (joint positions/velocities, IMU)
2. Estimate what the stock controller "wants" (by observing lowcmd at previous timestep)
3. Compute residual correction based on detected perturbation
4. Add correction to the stock controller's output on rt/lowcmd
5. Clamp to joint limits

**Challenge:** The stock controller runs on the locomotion computer. The residual runs on the Jetson. There's a ~2ms DDS round-trip latency between them. This may cause instability if the residual and stock controller fight each other. [T3 — Architectural inference, not tested]

### 3c. Control Barrier Functions (CBFs)

A formal safety framework that guarantees the robot stays within a "safe set": [T1 — Control theory]

```
Safety constraint:  h(x) ≥ 0     (e.g., CoM is within support polygon)
CBF condition:      ḣ(x,u) + α·h(x) ≥ 0     (safety is maintained over time)
```

At each timestep, solve a QP:
```
minimize    || u - u_desired ||^2     (stay close to desired action)
subject to  ḣ(x,u) + α·h(x) ≥ 0     (CBF safety constraint)
            u_min ≤ u ≤ u_max         (actuator limits)
```

**G1-specific work:**
- arXiv:2502.02858 uses Projected Safe Set Algorithm (p-SSA) on real G1 for collision avoidance in cluttered environments
- The same CBF framework can be applied to balance: define h(x) as the distance of CoM projection from the edge of the support polygon

**Pros:** Formal guarantee (if model is accurate), minimal modification to existing controller (just a safety filter)
**Cons:** Requires accurate dynamics model, computationally expensive real-time QP, conservative (may reject valid actions)

## 4. Always-On Balance Architecture

How to maintain balance as a background process during other activities (mocap playback, manipulation, teleoperation):

### Option A: Residual Overlay on Stock Controller
```
┌──────────┐  high-level  ┌──────────────┐  rt/lowcmd   ┌──────────┐
│ Task      │  commands    │ Stock Loco   │  (legs)      │ Joint    │
│ (mocap,   │────────────►│ Controller   │─────────────►│ Actuators│
│ manip)    │              │ (proprietary)│              └──────────┘
│           │  rt/lowcmd   │              │
│           │─(arms only)─►│              │
└──────────┘              └──────────────┘
    + optional residual corrections on leg joints
```
- Stock controller handles balance automatically
- User code controls arms/waist for task
- Optional: add small residual corrections to leg joints for enhanced stability
- **Risk level:** Low
- **Balance authority:** Whatever the stock controller provides

### Option B: GR00T-WBC (Recommended)
```
┌──────────┐  upper-body  ┌──────────────┐  rt/lowcmd   ┌──────────┐
│ Task      │  targets     │ GR00T-WBC    │  (all joints)│ Joint    │
│ (mocap,   │────────────►│              │─────────────►│ Actuators│
│ manip)    │              │ Loco Policy  │              └──────────┘
│           │              │ (RL, trained │
│           │              │  with pushes)│
└──────────┘              └──────────────┘
```
- Trained locomotion policy handles balance (including push recovery if perturbation-trained)
- Upper-body targets come from task (mocap, manipulation, teleoperation)
- WBC coordinator resolves conflicts between task and balance
- **Risk level:** Medium (need to validate RL locomotion policy)
- **Balance authority:** Full (can be specifically trained for perturbation robustness)

### Option C: Full Custom Policy
```
┌──────────┐  reference   ┌──────────────┐  rt/lowcmd   ┌──────────┐
│ Mocap     │  motion      │ Custom RL    │  (all joints)│ Joint    │
│ Reference │────────────►│ Tracking +   │─────────────►│ Actuators│
│           │              │ Balance      │              └──────────┘
│           │              │ Policy       │
└──────────┘              └──────────────┘
```
- Single RL policy that simultaneously tracks reference motion AND maintains balance
- BFM-Zero approach — trained on diverse motions with perturbation curriculum
- **Risk level:** High (full low-level control, must handle everything)
- **Balance authority:** Maximum (policy sees everything, controls everything)
- **Best for:** Production deployment after extensive sim validation

## 5. Fall Recovery (When Push Recovery Fails)

Even with robust push recovery, falls will happen during development. Recovery capability matters: [T1 — Research papers]

| Approach | Paper | Method | G1 Validated? |
|---|---|---|---|
| Two-stage RL | arXiv:2502.12152 | Separate supine/prone recovery policies | Yes |
| HoST | arXiv:2502.08378 | Multi-critic RL, diverse posture recovery | Yes |
| Unified safety | arXiv:2511.07407 | Prevention + mitigation + recovery combined | Yes (zero-shot) |

### Fall Detection
- **IMU-based:** Detect excessive tilt angle (e.g., pitch/roll > 45°) or angular velocity
- **Joint-based:** Detect unexpected ground contact (arm joints hitting torque limits)
- **CoM-based:** Estimate CoM position, detect when it exits recoverable region

### Fall Mitigation
- arXiv:2511.07407 trains a policy that, when fall is inevitable, actively reduces impact:
  - Tuck arms in
  - Rotate to distribute impact
  - Reduce angular velocity before ground contact

## 6. Metrics for Push Recovery

Quantitative measures to evaluate balance robustness: [T2 — Research community standards]

| Metric | Definition | Target (Good) | Target (Excellent) |
|---|---|---|---|
| Max recoverable push (standing) | Maximum impulse (N·s) the robot survives while standing | 30 N·s | 60+ N·s |
| Max recoverable push (walking) | Maximum impulse during walking | 20 N·s | 40+ N·s |
| Recovery time | Time from perturbation to return to steady state | < 2s | < 1s |
| Success rate | % of randomized pushes survived (test distribution) | > 90% | > 98% |
| CoM deviation | Maximum CoM displacement during recovery | < 0.3m | < 0.15m |
| No-step recovery range | Max push recovered without taking a step | 20 N·s | 40 N·s |

[T3 — Targets are estimates based on research papers, not G1-specific benchmarks]

## 7. Training Push-Robust Policies for G1

### Recommended Sim Environment
- **Isaac Gym** (via unitree_rl_gym) for massively parallel training
- **MuJoCo** (via MuJoCo Menagerie g1.xml) for validation
- **Domain randomization:** Friction (0.3-1.5), mass (±15%), motor strength (±10%), latency (0-10ms)

### Reward Design for Push Robustness
```python
# Pseudocode — typical reward structure
reward = (
    + w_alive * alive_bonus            # Stay upright
    + w_track * velocity_tracking      # Follow commanded velocity
    + w_smooth * action_smoothness     # Minimize jerk
    - w_energy * energy_penalty        # Minimize energy use
    - w_fall * fall_penalty            # Heavy penalty for falling
    - w_slip * foot_slip_penalty       # Minimize foot sliding
    + w_upright * upright_bonus        # Reward torso verticality
)
```

### Training Stages (Multi-Phase Curriculum)
1. **Phase 1:** Stand without falling (no perturbations)
2. **Phase 2:** Walk on flat terrain (no perturbations)
3. **Phase 3:** Walk with small random pushes (10-30N)
4. **Phase 4:** Walk with medium pushes (30-80N) + terrain variation
5. **Phase 5:** Walk with large pushes (80-200N) + task (upper body motion)

[T2 — Based on curriculum strategies in published G1 papers]

## 8. Development Roadmap

Recommended progression for achieving "always-on balance during mocap":

```
Phase 1: Evaluate stock controller push limits
    └── Push test on real G1, document max impulse
Phase 2: Train push-robust locomotion policy in sim
    └── unitree_rl_gym + perturbation curriculum
    └── Validate in MuJoCo (Sim2Sim)
Phase 3: Deploy on real G1 (locomotion only)
    └── Start with gentle pushes, increase gradually
Phase 4: Add upper-body mocap tracking
    └── GR00T-WBC or custom WBC layer
    └── Test: can it maintain balance while arms track mocap?
Phase 5: Combined push + mocap testing
    └── Push robot while it replays mocap motion
    └── Iterate on perturbation curriculum if needed
```

## Key Relationships
- Extends: [[locomotion-control]] (enhanced version of stock balance)
- Component of: [[whole-body-control]] (balance as a constraint in WBC)
- Protects: [[motion-retargeting]] (ensures stability during mocap playback)
- Governed by: [[safety-limits]] (fall detection, e-stop integration)
- Trained via: [[learning-and-ai]] (RL with perturbation curriculum)
- Tested in: [[simulation]] (MuJoCo/Isaac with external force application)
- Bounded by: [[equations-and-bounds]] (CoM, ZMP, support polygon)