11 KiB
| id | title | status | source_sections | related_topics | key_equations | key_terms | images | examples | open_questions |
|---|---|---|---|---|---|---|---|---|---|
| locomotion-control | Locomotion & Balance Control | established | reference/sources/paper-gait-conditioned-rl.md, reference/sources/paper-getting-up-policies.md, reference/sources/official-product-page.md | [joint-configuration sensors-perception equations-and-bounds learning-and-ai safety-limits whole-body-control push-recovery-balance motion-retargeting] | [zmp com inverse_dynamics] | [gait state_estimation gait_conditioned_rl curriculum_learning sim_to_real] | [] | [] | [Exact RL policy observation/action space dimensions How to replace the stock locomotion policy with a custom one Stair climbing capability and limits Running gait availability (H1-2 can run at 3.3 m/s — can G1?)] |
Locomotion & Balance Control
Walking, balance, gait generation, and whole-body control for bipedal locomotion.
1. Control Architecture
The G1 uses a reinforcement-learning-based locomotion controller running on the proprietary locomotion computer. Users interact with it via high-level commands; the low-level balance and gait control is handled internally. [T1 — Confirmed from RL papers and developer docs]
User Commands (high-level API)
│
▼
┌─────────────────────────┐
│ Locomotion Computer │ (192.168.123.161, proprietary)
│ │
│ RL Policy (gait- │ ← IMU, joint encoders (500 Hz)
│ conditioned, multi- │
│ phase curriculum) │
│ │
│ Motor Commands ─────────┼──→ Joint Actuators
└─────────────────────────┘
Key Architecture Details
- Framework: Gait-conditioned reinforcement learning with multi-phase curriculum (arXiv:2505.20619) [T1]
- Gait switching: One-hot gait ID enables dynamic switching between gaits [T1]
- Reward design: Gait-specific reward routing mechanism with biomechanically inspired shaping [T1]
- Training: Policies trained in simulation (Isaac Gym / MuJoCo), transferred to physical hardware [T1]
- Biomechanical features: Straight-knee stance promotion, coordinated arm-leg swing, natural motion without motion capture data [T1]
2. Gait Modes
| Mode | Description | Verified | Tier |
|---|---|---|---|
| Standing | Static balance, all feet grounded | Yes | T1 |
| Walking | Dynamic bipedal walking | Yes | T1 |
| Walk-to-stand | Smooth transition from walking to standing | Yes | T1 |
| Stand-to-walk | Smooth transition from standing to walking | Yes | T1 |
[T1 — Validated in arXiv:2505.20619 on real G1 hardware]
3. Performance
| Metric | Value | Notes | Tier |
|---|---|---|---|
| Maximum walking speed | 2.0 m/s | 7.2 km/h | T0 |
| Verified terrain | Tile, concrete, carpet | Office-environment surfaces | T1 |
| Balance recovery | Light push recovery | Stable recovery from perturbations | T1 |
| Gait transition | Smooth | No abrupt mode switches | T1 |
For comparison, the H1-2 (larger Unitree humanoid) achieves 3.3 m/s running. Whether the G1 has a running gait is unconfirmed. [T3]
4. Balance Control
The RL-based locomotion policy implicitly handles balance through learned behavior rather than explicit ZMP or capture-point controllers: [T1]
- Inputs: IMU data (orientation, angular velocity), joint encoder feedback (position, velocity), gait command
- Outputs: Target joint positions/torques for all leg joints
- Rate: 500 Hz control loop
- Learned behaviors: Center-of-mass tracking, foot placement, push recovery, arm counterbalancing
While classical bipedal control uses explicit ZMP constraints (see equations-and-bounds), the G1's RL policy learns these constraints implicitly during training.
For deep coverage of enhanced push recovery, perturbation training, and always-on balance architectures, see push-recovery-balance.
5. Fall Recovery
Multiple research approaches have been validated on the G1: [T1 — Research papers]
- Two-stage RL: Supine and prone recovery policies (arXiv:2502.12152) — overcome limitations of hand-crafted controllers
- HoST framework: Multi-critic RL with curriculum training for diverse posture recovery (arXiv:2502.08378)
- Unified fall-safety: Combined fall prevention + impact mitigation + recovery from sparse demonstrations (arXiv:2511.07407) — zero-shot sim-to-real transfer
6. Terrain Adaptation
| Terrain Type | Status | Notes | Tier |
|---|---|---|---|
| Flat tile | Verified | Standard office floor | T1 |
| Concrete | Verified | Indoor/outdoor flat surfaces | T1 |
| Carpet | Verified | Standard office carpet | T1 |
| Stairs | Unconfirmed | Research papers suggest capability | T4 |
| Rough terrain | Sim only | Trained in sim, real-world unconfirmed | T3 |
| Slopes | Unconfirmed | — | T4 |
7. User Control Interface
Users control locomotion through the high-level sport mode API: [T0]
- Velocity commands: Set forward/lateral velocity and yaw rate
- Posture commands: Stand, sit, lie down
- Attitude adjustment: Modify body orientation
- Trajectory tracking: Follow waypoint sequences
Low-level joint control is also possible (bypassing the locomotion controller) but requires the user to implement their own balance control. This is advanced and carries significant fall risk. [T2]
8. Locomotion Computer Internals
The locomotion computer is a Rockchip RK3588 (8-core ARM Cortex-A76/A55, 8GB LPDDR4X, 32GB eMMC) running Linux kernel 5.10.176-rt86+ (real-time patched). [T1 — Security research papers arXiv:2509.14096, arXiv:2509.14139]
Software Architecture
A centralized master_service orchestrator (9.2 MB binary) supervises 26 daemons: [T1]
| Daemon | Role | Resource Usage |
|---|---|---|
ai_sport |
Primary locomotion/balance policy | 145% CPU, 135 MB RAM |
state_estimator |
IMU + encoder fusion | ~30% CPU |
motion_switcher |
Gait mode management | — |
robot_state_service |
State broadcasting | — |
dex3_service_l/r |
Left/right hand control | — |
webrtc_bridge |
Video streaming | — |
ros_bridge |
ROS2 interface | — |
| Others | OTA, BLE, WiFi, telemetry, etc. | — |
The ai_sport daemon is the stock RL policy. When you enter debug mode (L2+R2), this daemon is shut down, allowing direct motor control via rt/lowcmd.
Configuration files use proprietary FMX encryption (Blowfish-ECB + LCG stream cipher with static keys). This has been partially reverse-engineered by security researchers but not fully cracked. [T1]
Can You Access the Locomotion Computer?
Root access is technically possible via known BLE exploits (UniPwn, FreeBOT jailbreak), but no one has publicly documented deploying a custom policy to it: [T1]
| Method | Status | Notes |
|---|---|---|
| SSH from network | Blocked | No SSH server exposed by default |
| FreeBOT jailbreak (app WiFi field injection) | Works on firmware ≤1.6.0 | Patched Oct 2025 |
| UniPwn BLE exploit (Bin4ry/UniPwn on GitHub) | Works on unpatched firmware | Hardcoded AES keys + command injection |
| RockUSB physical flash | Blocked by SecureBoot on G1 | Works on Go2 only |
Replacing ai_sport binary after root |
Not documented | Nobody has published doing this |
| Extracting stock policy weights | Not documented | Binary analysis not published |
Bottom line: Getting root on the RK3588 is solved. Getting a custom locomotion policy running natively on it is not — the master_service orchestrator, FMX encryption, and lack of documentation are barriers nobody has publicly overcome. [T1]
How Every Research Group Actually Deploys
All published research (BFM-Zero, gait-conditioned RL, fall recovery, etc.) uses the same approach: [T1]
- Enter debug mode (L2+R2) — shuts down
ai_sport - Run custom policy on the Jetson Orin NX or an external computer
- Read
rt/lowstate, compute actions, publishrt/lowcmdvia DDS - Motor commands travel over the internal DDS network to the RK3588, which passes them to motor drivers
This works but has inherent limitations:
- DDS network latency (~2ms round trip) vs. native on-board execution
- No access to the RK3588's real-time Linux kernel guarantees
- Policy frequency limited by DDS throughput and compute (typically 200-500 Hz from Jetson)
9. Custom Policy Replacement (Practical)
When to Replace
- You need whole-body coordination (mocap + balance)
- You need push recovery beyond what the stock controller provides
- You want to run a custom RL policy trained with perturbation curriculum
How to Replace (Debug Mode)
- Suspend robot on stand or harness
- Enter damping state, press L2+R2 —
ai_sportshuts down - Send
MotorCmd_messages onrt/lowcmdfrom Jetson or external PC - Read
rt/lowstatefor joint positions, velocities, and IMU data - Publish at 500 Hz for smooth control (C++ recommended over Python for lower latency)
- To exit debug mode: reboot the robot (no other way)
Risks
- Fall risk: If your policy fails, the robot falls immediately — no stock controller safety net
- Hardware damage: Incorrect joint commands can damage actuators
- Always test in simulation first (see simulation)
Alternative: Residual Overlay
Instead of full replacement, train a residual policy that adds small corrections to the stock controller output. See push-recovery-balance for details.
WBC Frameworks
For coordinated whole-body control (balance + task), see whole-body-control, particularly GR00T-WBC which is designed for exactly this use case on G1.
Key Relationships
- Uses: joint-configuration (leg joints as actuators, 500 Hz commands)
- Uses: sensors-perception (IMU + encoders for state estimation)
- Trained via: learning-and-ai (RL training pipeline)
- Bounded by: equations-and-bounds (ZMP, joint limits)
- Governed by: safety-limits (fall detection, torque limits)
- Extended by: push-recovery-balance (enhanced perturbation robustness)
- Coordinated by: whole-body-control (WBC for combined loco-manipulation)
- Enables: motion-retargeting (balance during mocap playback)