--- id: push-recovery-balance title: "Push Recovery & Robust Balance" status: established source_sections: "reference/sources/paper-gait-conditioned-rl.md, reference/sources/paper-getting-up-policies.md, reference/sources/paper-safe-control-cluttered.md, reference/sources/paper-residual-policy.md, reference/sources/paper-cbf-humanoid.md" related_topics: [locomotion-control, whole-body-control, safety-limits, equations-and-bounds, learning-and-ai, simulation] key_equations: [com, zmp, inverse_dynamics] key_terms: [push_recovery, ankle_strategy, hip_strategy, stepping_strategy, residual_policy, control_barrier_function, support_polygon, perturbation_curriculum] images: [] examples: [] open_questions: - "What is the max recoverable push force for the stock G1 controller?" - "Does residual policy overlay work with the proprietary locomotion computer, or does it require full replacement?" - "What is the minimum viable sensor set for push detection (IMU only vs. IMU + F/T)?" - "What perturbation force ranges should be used in training curriculum?" --- # Push Recovery & Robust Balance Making the G1 robust to external pushes and maintaining balance during all activities — the "always-on" stability layer. ## 1. Push Recovery Strategies When a humanoid is pushed, it can respond with progressively more aggressive strategies depending on perturbation magnitude: [T1 — Established biomechanics/robotics] ### Ankle Strategy (Small Perturbations) - **Mechanism:** Ankle torque adjusts center of pressure (CoP) within the foot - **Range:** Small pushes that don't move CoM outside the foot support area - **Speed:** Fastest response (~50ms) - **G1 applicability:** Yes — G1 has ankle pitch and roll joints [T1] - **Limitation:** Only works for small perturbations; foot must remain flat ### Hip Strategy (Medium Perturbations) - **Mechanism:** Rapid hip flexion/extension shifts CoM back over support - **Range:** Pushes that exceed ankle authority but don't require stepping - **Speed:** Medium (~100-200ms) - **G1 applicability:** Yes — G1 hip has 3 DOF with ±154° pitch [T1] - **Often combined with:** Upper body countermotion (arms swing opposite to push direction) ### Stepping Strategy (Large Perturbations) - **Mechanism:** Take a recovery step to create a new support polygon under the shifted CoM - **Range:** Large pushes where CoM exits current support polygon - **Speed:** Slowest (~300-500ms, must plan and execute step) - **G1 applicability:** Yes — requires whole-body coordination [T1] - **Most complex:** Needs free space to step into, foot placement planning ### Combined/Learned Strategy Modern RL-based controllers learn a blended strategy that seamlessly transitions between ankle, hip, and stepping responses based on perturbation magnitude. This is the approach used by the G1's stock controller and by research push-recovery policies. [T1] ## 2. What the Stock Controller Already Does The G1's proprietary RL-based locomotion controller (running on the locomotion computer at 192.168.123.161) already handles basic push recovery: [T1] - **Light push recovery** during standing and walking — confirmed in arXiv:2505.20619 - **Gait-conditioned policy** implicitly learns balance through training - **500 Hz control loop** provides fast response to perturbations However, the stock controller's push recovery limits are not documented. Key unknowns: - Maximum recoverable impulse (N·s) during standing - Maximum recoverable impulse during walking - Whether it uses stepping recovery or only ankle/hip strategies - How it performs when the upper body is doing something unexpected (e.g., mocap) ## 3. Enhancing Push Recovery ### 3a. Perturbation Curriculum Training The most validated approach: train an RL policy in simulation with random external forces applied during training. [T1 — Multiple G1 papers] ``` Training Loop (in sim): 1. Run locomotion policy 2. At random intervals, apply external force to robot torso - Direction: random (forward, backward, lateral) - Magnitude: curriculum (start small, increase as policy improves) - Duration: 0.1-0.5s impulse 3. Reward: stay upright, track velocity command, minimize energy 4. Penalty: falling, excessive joint acceleration ``` **Key papers validated on G1:** | Paper | Approach | Validated? | Key Finding | |---|---|---|---| | arXiv:2505.20619 | Gait-conditioned RL with perturbations | Yes (real G1) | Push robustness during walking | | arXiv:2511.07407 | Unified fall prevention + mitigation + recovery | Yes (zero-shot) | Combined strategy from sparse demos | | arXiv:2502.12152 | Two-stage recovery (supine + prone) | Yes (real G1) | Get-up after falling | | arXiv:2502.08378 | HoST multi-critic RL | Yes (real G1) | Diverse posture recovery | ### Perturbation Curriculum Parameters (Typical) | Parameter | Start | End | Notes | |---|---|---|---| | Max force (N) | 20 | 100-200 | Ramp over training | | Force duration (s) | 0.1 | 0.5 | Short impulses to sustained pushes | | Direction | Forward only | Omnidirectional | Add lateral/backward progressively | | Frequency | Rare (every 10s) | Frequent (every 2s) | Increase as policy improves | | Application point | Torso center | Random (torso, shoulders) | Vary to generalize | [T2 — Ranges from research papers, not G1-specific tuning] ### 3b. Residual Policy Learning Train a small "correction" policy that adds to the output of an existing base controller: [T1 — Established technique] ``` Base Controller Output (stock or trained): a_base Residual Policy Output (small corrections): a_residual Final Action: a = a_base + α * a_residual (α < 1 for safety) ``` **Why this matters for G1:** - The stock locomotion controller is good but not customizable - A residual policy can be trained on top of it to improve push recovery - The scaling factor α limits how much the residual can deviate from base behavior - This is the safest path to "enhanced" balance without replacing the stock controller **Implementation on G1 (Approach A — Overlay):** 1. Read current lowstate (joint positions/velocities, IMU) 2. Estimate what the stock controller "wants" (by observing lowcmd at previous timestep) 3. Compute residual correction based on detected perturbation 4. Add correction to the stock controller's output on rt/lowcmd 5. Clamp to joint limits **Challenge:** The stock controller runs on the locomotion computer. The residual runs on the Jetson. There's a ~2ms DDS round-trip latency between them. This may cause instability if the residual and stock controller fight each other. [T3 — Architectural inference, not tested] ### 3c. Control Barrier Functions (CBFs) A formal safety framework that guarantees the robot stays within a "safe set": [T1 — Control theory] ``` Safety constraint: h(x) ≥ 0 (e.g., CoM is within support polygon) CBF condition: ḣ(x,u) + α·h(x) ≥ 0 (safety is maintained over time) ``` At each timestep, solve a QP: ``` minimize || u - u_desired ||^2 (stay close to desired action) subject to ḣ(x,u) + α·h(x) ≥ 0 (CBF safety constraint) u_min ≤ u ≤ u_max (actuator limits) ``` **G1-specific work:** - arXiv:2502.02858 uses Projected Safe Set Algorithm (p-SSA) on real G1 for collision avoidance in cluttered environments - The same CBF framework can be applied to balance: define h(x) as the distance of CoM projection from the edge of the support polygon **Pros:** Formal guarantee (if model is accurate), minimal modification to existing controller (just a safety filter) **Cons:** Requires accurate dynamics model, computationally expensive real-time QP, conservative (may reject valid actions) ## 4. Always-On Balance Architecture How to maintain balance as a background process during other activities (mocap playback, manipulation, teleoperation): ### Option A: Residual Overlay on Stock Controller ``` ┌──────────┐ high-level ┌──────────────┐ rt/lowcmd ┌──────────┐ │ Task │ commands │ Stock Loco │ (legs) │ Joint │ │ (mocap, │────────────►│ Controller │─────────────►│ Actuators│ │ manip) │ │ (proprietary)│ └──────────┘ │ │ rt/lowcmd │ │ │ │─(arms only)─►│ │ └──────────┘ └──────────────┘ + optional residual corrections on leg joints ``` - Stock controller handles balance automatically - User code controls arms/waist for task - Optional: add small residual corrections to leg joints for enhanced stability - **Risk level:** Low - **Balance authority:** Whatever the stock controller provides ### Option B: GR00T-WBC (Recommended) ``` ┌──────────┐ upper-body ┌──────────────┐ rt/lowcmd ┌──────────┐ │ Task │ targets │ GR00T-WBC │ (all joints)│ Joint │ │ (mocap, │────────────►│ │─────────────►│ Actuators│ │ manip) │ │ Loco Policy │ └──────────┘ │ │ │ (RL, trained │ │ │ │ with pushes)│ └──────────┘ └──────────────┘ ``` - Trained locomotion policy handles balance (including push recovery if perturbation-trained) - Upper-body targets come from task (mocap, manipulation, teleoperation) - WBC coordinator resolves conflicts between task and balance - **Risk level:** Medium (need to validate RL locomotion policy) - **Balance authority:** Full (can be specifically trained for perturbation robustness) ### Option C: Full Custom Policy ``` ┌──────────┐ reference ┌──────────────┐ rt/lowcmd ┌──────────┐ │ Mocap │ motion │ Custom RL │ (all joints)│ Joint │ │ Reference │────────────►│ Tracking + │─────────────►│ Actuators│ │ │ │ Balance │ └──────────┘ │ │ │ Policy │ └──────────┘ └──────────────┘ ``` - Single RL policy that simultaneously tracks reference motion AND maintains balance - BFM-Zero approach — trained on diverse motions with perturbation curriculum - **Risk level:** High (full low-level control, must handle everything) - **Balance authority:** Maximum (policy sees everything, controls everything) - **Best for:** Production deployment after extensive sim validation ## 5. Fall Recovery (When Push Recovery Fails) Even with robust push recovery, falls will happen during development. Recovery capability matters: [T1 — Research papers] | Approach | Paper | Method | G1 Validated? | |---|---|---|---| | Two-stage RL | arXiv:2502.12152 | Separate supine/prone recovery policies | Yes | | HoST | arXiv:2502.08378 | Multi-critic RL, diverse posture recovery | Yes | | Unified safety | arXiv:2511.07407 | Prevention + mitigation + recovery combined | Yes (zero-shot) | ### Fall Detection - **IMU-based:** Detect excessive tilt angle (e.g., pitch/roll > 45°) or angular velocity - **Joint-based:** Detect unexpected ground contact (arm joints hitting torque limits) - **CoM-based:** Estimate CoM position, detect when it exits recoverable region ### Fall Mitigation - arXiv:2511.07407 trains a policy that, when fall is inevitable, actively reduces impact: - Tuck arms in - Rotate to distribute impact - Reduce angular velocity before ground contact ## 6. Metrics for Push Recovery Quantitative measures to evaluate balance robustness: [T2 — Research community standards] | Metric | Definition | Target (Good) | Target (Excellent) | |---|---|---|---| | Max recoverable push (standing) | Maximum impulse (N·s) the robot survives while standing | 30 N·s | 60+ N·s | | Max recoverable push (walking) | Maximum impulse during walking | 20 N·s | 40+ N·s | | Recovery time | Time from perturbation to return to steady state | < 2s | < 1s | | Success rate | % of randomized pushes survived (test distribution) | > 90% | > 98% | | CoM deviation | Maximum CoM displacement during recovery | < 0.3m | < 0.15m | | No-step recovery range | Max push recovered without taking a step | 20 N·s | 40 N·s | [T3 — Targets are estimates based on research papers, not G1-specific benchmarks] ## 7. Training Push-Robust Policies for G1 ### Recommended Sim Environment - **Isaac Gym** (via unitree_rl_gym) for massively parallel training - **MuJoCo** (via MuJoCo Menagerie g1.xml) for validation - **Domain randomization:** Friction (0.3-1.5), mass (±15%), motor strength (±10%), latency (0-10ms) ### Reward Design for Push Robustness ```python # Pseudocode — typical reward structure reward = ( + w_alive * alive_bonus # Stay upright + w_track * velocity_tracking # Follow commanded velocity + w_smooth * action_smoothness # Minimize jerk - w_energy * energy_penalty # Minimize energy use - w_fall * fall_penalty # Heavy penalty for falling - w_slip * foot_slip_penalty # Minimize foot sliding + w_upright * upright_bonus # Reward torso verticality ) ``` ### Training Stages (Multi-Phase Curriculum) 1. **Phase 1:** Stand without falling (no perturbations) 2. **Phase 2:** Walk on flat terrain (no perturbations) 3. **Phase 3:** Walk with small random pushes (10-30N) 4. **Phase 4:** Walk with medium pushes (30-80N) + terrain variation 5. **Phase 5:** Walk with large pushes (80-200N) + task (upper body motion) [T2 — Based on curriculum strategies in published G1 papers] ## 8. Development Roadmap Recommended progression for achieving "always-on balance during mocap": ``` Phase 1: Evaluate stock controller push limits └── Push test on real G1, document max impulse Phase 2: Train push-robust locomotion policy in sim └── unitree_rl_gym + perturbation curriculum └── Validate in MuJoCo (Sim2Sim) Phase 3: Deploy on real G1 (locomotion only) └── Start with gentle pushes, increase gradually Phase 4: Add upper-body mocap tracking └── GR00T-WBC or custom WBC layer └── Test: can it maintain balance while arms track mocap? Phase 5: Combined push + mocap testing └── Push robot while it replays mocap motion └── Iterate on perturbation curriculum if needed ``` ## Key Relationships - Extends: [[locomotion-control]] (enhanced version of stock balance) - Component of: [[whole-body-control]] (balance as a constraint in WBC) - Protects: [[motion-retargeting]] (ensures stability during mocap playback) - Governed by: [[safety-limits]] (fall detection, e-stop integration) - Trained via: [[learning-and-ai]] (RL with perturbation curriculum) - Tested in: [[simulation]] (MuJoCo/Isaac with external force application) - Bounded by: [[equations-and-bounds]] (CoM, ZMP, support polygon)