You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

9.1 KiB

id title status source_sections related_topics key_equations key_terms images examples open_questions
learning-and-ai Learning & AI established reference/sources/github-unitree-rl-gym.md, reference/sources/github-unitree-rl-lab.md, reference/sources/github-xr-teleoperate.md, reference/sources/paper-bfm-zero.md, reference/sources/paper-gait-conditioned-rl.md [simulation locomotion-control manipulation sdk-programming whole-body-control motion-retargeting push-recovery-balance] [] [gait_conditioned_rl curriculum_learning sim_to_real lerobot xr_teleoperate teleoperation] [] [] [Optimal reward function design for G1 locomotion Training time estimates for different policy types How to fine-tune the stock locomotion policy LLM-based task planning integration status (firmware v3.2+)]

Learning & AI

Reinforcement learning, imitation learning, and AI-based control for the G1.

1. Reinforcement Learning

Official RL Frameworks

Framework Repository Base Library Sim Engine G1 Support Tier
unitree_rl_gym unitreerobotics/unitree_rl_gym legged_gym + rsl_rl Isaac Gym Yes T0
unitree_rl_lab unitreerobotics/unitree_rl_lab Isaac Lab Isaac Lab G1-29dof T0

unitree_rl_gym — Complete RL Pipeline

The primary framework for training locomotion policies: [T0]

  • Supported robots: Go2, H1, H1_2, G1
  • Algorithm: PPO (via rsl_rl)
  • Training: Parallel environments, GPU/CPU device selection, checkpoint management
  • Pipeline: Train → Play → Sim2Sim (MuJoCo validation) → Sim2Real (unitree_sdk2_python)
  • Deployment: Python scripts and C++ binaries with network interface configuration

unitree_rl_lab — Isaac Lab Integration

Advanced RL training on NVIDIA Isaac Lab: [T0]

  • Supported robots: Go2, H1, G1-29dof
  • Simulation backends: Isaac Lab (NVIDIA) and MuJoCo (cross-sim validation)
  • Deployment: Simulation → Sim-to-sim → Real robot via unitree_sdk2
  • Language mix: Python 65.1%, C++ 31.3%

Key RL Research on G1

Paper Contribution Validated on G1? Tier
Gait-Conditioned RL (arXiv:2505.20619) Multi-phase curriculum, gait-specific reward routing Yes T1
Getting-Up Policies (arXiv:2502.12152) Two-stage fall recovery via RL Yes T1
HoST (arXiv:2502.08378) Multi-critic RL for diverse posture recovery Yes T1
Fall-Safety (arXiv:2511.07407) Unified prevention + mitigation + recovery Yes (zero-shot) T1
Vision Locomotion (arXiv:2602.06382) End-to-end depth-based locomotion Yes T1
Safe Control (arXiv:2502.02858) Projected Safe Set for collision avoidance Yes T1

2. Imitation Learning

Data Collection — Teleoperation

System Device Repository Features
XR Teleoperate Vision Pro, PICO 4, Quest 3 unitreerobotics/xr_teleoperate Hand tracking, data recording
Kinect Teleoperate Azure Kinect DK unitreerobotics/kinect_teleoperate Body tracking, safety wake-up

Training Frameworks

Framework Repository Purpose
unitree_IL_lerobot unitreerobotics/unitree_IL_lerobot Modified LeRobot for G1 dual-arm training
HuggingFace LeRobot huggingface.co/docs/lerobot/en/unitree_g1 Standard LeRobot with G1 config

LeRobot G1 integration: Supports both 29-DOF and 23-DOF versions, includes gr00t_wbc locomotion integration for whole-body control during manipulation tasks. [T1]

Imitation Learning Workflow

1. Teleoperate (XR/Kinect) → record episodes
2. Process data → extract observation-action pairs
3. Train policy (LeRobot / custom) → behavior cloning or diffusion policy
4. Deploy → unitree_sdk2 on real robot

3. Policy Deployment

Deployment Options

Method Language Latency Use Case
unitree_sdk2_python Python Higher Prototyping, research
unitree_sdk2 (C++) C++ Lower Production, real-time control

Deployment Checklist

  1. Validate in simulation — Run policy in unitree_mujoco or Isaac Lab
  2. Cross-sim validate — Test in a second simulator (Sim2Sim)
  3. Low-gain start — Deploy with reduced gains initially
  4. Tethered testing — Support robot with a safety harness for first real-world tests
  5. Gradual ramp-up — Increase to full gains after verifying stability

Safety Wrappers

When deploying custom policies, add safety layers: [T2 — Best practice]

  • Joint limit clamping (see equations-and-bounds)
  • Torque saturation limits
  • Fall detection with emergency stop
  • Velocity bounds for safe walking speeds

4. Foundation Models

BFM-Zero (arXiv:2511.04131)

First behavioral foundation model for real humanoids: [T1]

  • Key innovation: Promptable control without retraining (reward optimization, pose reaching, motion tracking)
  • Training: Motion capture data regularization + online off-policy unsupervised RL
  • Validation: Deployed on G1 hardware
  • Significance: Enables flexible task specification without policy retraining

Behavior Foundation Model (arXiv:2509.13780)

  • Uses masked online distillation with Conditional Variational Autoencoder (CVAE)
  • Models behavioral distributions from large-scale datasets
  • Tested on G1 (1.3m, 29-DOF) [T1]

LLM Integration (Firmware v3.2+)

  • Preliminary LLM integration support on EDU models [T2]
  • Natural language task commands via Jetson Orin [T2]
  • Status and capabilities not yet fully documented — see open questions

5. Motion Tracking Policies

RL policies trained to imitate reference motions (from mocap) while maintaining balance: [T1 — Research papers]

Framework Paper Approach G1 Validated?
BFM-Zero arXiv:2511.04131 Foundation model with motion tracking mode Yes
H2O arXiv:2403.01623 Real-time human-to-humanoid tracking Humanoid (not G1 specifically)
OmniH2O arXiv:2406.08858 Multi-modal input tracking Humanoid
HumanPlus arXiv:2406.10454 RGB camera shadow → imitation Humanoid

BFM-Zero is the most directly G1-relevant: it provides a "motion tracking" mode where the policy receives a reference pose and tracks it while maintaining balance. Zero-shot generalization to unseen motions. Open-source. See motion-retargeting for the full retargeting pipeline.

Key insight: These policies learn to simultaneously track the reference motion AND maintain balance. Push recovery is implicit — the same policy handles both. Training with perturbation curriculum further enhances robustness. See push-recovery-balance.

6. Residual Policy Learning

Training a small correction policy on top of an existing base controller: [T1 — Established technique]

a_final = a_base + α * a_residual     (α ∈ [0, 1] for safety scaling)
  • Base policy: Stock G1 controller or a pre-trained locomotion policy
  • Residual policy: Small network trained to improve specific behavior (e.g., push recovery)
  • Scaling factor α: Limits maximum deviation from base behavior

Use case for G1: Enhance the stock controller's push recovery without replacing it entirely. Train the residual in simulation with perturbation curriculum, deploy as an overlay. See push-recovery-balance §3b.

7. Perturbation Curriculum

Training RL policies with progressively increasing external disturbances: [T1 — Multiple G1 papers]

Stage 1: No perturbations (learn basic locomotion)
Stage 2: Small random pushes (10-30N, occasional)
Stage 3: Medium pushes (30-80N, more frequent)
Stage 4: Large pushes (80-200N) + terrain variation
Stage 5: Large pushes + concurrent upper-body task

This is the primary method for achieving the "always-on balance" goal. Papers arXiv:2505.20619 and arXiv:2511.07407 demonstrate this approach on real G1 hardware. See push-recovery-balance §3a for detailed parameters.

Key Relationships