From e982ea4962e0fadfc4de239f98792cdbceca7c4e Mon Sep 17 00:00:00 2001
From: Joe DiPrima <epilectrik@gmail.com>
Date: Sat, 14 Feb 2026 16:47:17 -0600
Subject: [PATCH] Add Unitree G1 robot integration document

Detailed architecture for using GB10 as offboard AI brain for the G1
humanoid robot: connection options (Wi-Fi/10GbE), LLM task planning,
VLM inference, RL training (Isaac Lab), imitation learning, latency
budgets, communication protocols (REST/DDS/ROS2), and quick start guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 CLAUDE.md                 |  11 +-
 context/g1-integration.md | 299 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 306 insertions(+), 4 deletions(-)
 create mode 100644 context/g1-integration.md

diff --git a/CLAUDE.md b/CLAUDE.md
index bc1547d..7a73d56 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -64,6 +64,7 @@ cross-validated data that is more precise and more specific than general knowled
 | Formulas, bounds, constants, performance numbers    | `context/equations-and-bounds.md`       |
 | What we don't know, gaps, unknowns                  | `context/open-questions.md`             |
 | Term definitions, units, acronyms                   | `reference/glossary.yaml`               |
+| Unitree G1 robot, offboard AI, integration          | `context/g1-integration.md`             |
 | Worked calculations, example workflows              | `examples/*.md`                         |
 
 ---
@@ -136,13 +137,15 @@ Dell Pro Max GB10 (product)
   │               │
   │               └── AI Workloads ──── LLM inference (up to 200B), fine-tuning
   │
-  ├── Connectivity ──── USB-C, HDMI 2.1b, 10GbE, ConnectX-7 QSFP
+  ├── Connectivity ──── USB-C, HDMI 2.1a, 10GbE, ConnectX-7 QSFP
   │       │
-  │       └── Multi-Unit Stacking ──── 2x units via ConnectX-7, up to 400B models
+  │       ├── Multi-Unit Stacking ──── 2x units via ConnectX-7, up to 400B models
+  │       │
+  │       └── G1 Integration ──── Offboard AI brain for Unitree G1 robot
   │
-  ├── DGX OS 7 ──── Ubuntu 24.04, NVIDIA drivers, CUDA toolkit
+  ├── DGX OS 7 ──── Ubuntu 24.04, NVIDIA drivers, CUDA 13.0, sm_121
   │
-  ├── Physical ──── 150x150x51mm, 1.31kg, 280W USB-C PSU
+  ├── Physical ──── 150x150x51mm, 1.22-1.34kg, 280W USB-C PSU
   │
   └── SKUs ──── 2TB ($3,699) / 4TB ($3,999)
 ```
diff --git a/context/g1-integration.md b/context/g1-integration.md
new file mode 100644
index 0000000..297b271
--- /dev/null
+++ b/context/g1-integration.md
@@ -0,0 +1,299 @@
+---
+id: g1-integration
+title: "Unitree G1 Robot Integration"
+status: proposed
+source_sections: "Cross-referenced from git/unitree-g1 context system + Dell Pro Max GB10 context"
+related_topics: [connectivity, ai-workloads, ai-frameworks, multi-unit-stacking]
+key_equations: []
+key_terms: [unitree-g1, jetson-orin-nx, offboard-compute, cyclonedds, dds, teleoperation, isaac-lab, lerobot]
+images: []
+examples: []
+open_questions:
+  - "DDS latency over Wi-Fi between GB10 and G1 under realistic conditions"
+  - "Optimal LLM size for real-time task planning (latency vs. capability tradeoff)"
+  - "Isaac Lab on GB10 vs. dedicated x86 workstation for RL training throughput"
+  - "Docker container availability for CycloneDDS 0.10.2 on ARM64 DGX OS"
+  - "Power-efficient always-on inference mode for persistent LLM service"
+---
+
+# Unitree G1 Robot Integration
+
+Architecture and use cases for pairing the Dell Pro Max GB10 as an offboard AI companion to the Unitree G1 humanoid robot.
+
+## 1. Why Integrate?
+
+The G1's development computer (Jetson Orin NX 16GB, 100 TOPS) is excellent for real-time control but severely limited for large-scale AI. The GB10 removes that ceiling.
+
+| Capability | G1 Orin NX | GB10 | Factor |
+|---|---|---|---|
+| AI compute | 100 TOPS | 1,000 TFLOPS (FP4) | ~10,000x |
+| Memory for models | 16 GB | 128 GB unified | 8x |
+| Max LLM (quantized) | ~7B | ~200B | ~30x |
+| CUDA architecture | sm_87 (Orin) | sm_121 (Blackwell) | 2 gens newer |
+| Storage | 32 GB eMMC (loco) | 2-4 TB NVMe | 60-125x |
+
+The GB10 acts as an **offboard AI brain** — handling reasoning, planning, training, and heavy inference that the Orin NX cannot.
+
+## 2. Connection Architecture
+
+```
+┌──────────────────────────┐              ┌───────────────────────────┐
+│      Unitree G1          │              │   Dell Pro Max GB10       │
+│                          │              │                           │
+│  ┌─────────────────┐     │              │   128 GB unified memory   │
+│  │ Locomotion CPU   │    │              │   1 PFLOP FP4             │
+│  │ RK3588           │    │              │   DGX OS (Ubuntu 24.04)   │
+│  │ 192.168.123.161  │    │              │                           │
+│  │ 500 Hz control   │    │              │   ┌──────────────────┐    │
+│  └──────┬───────────┘    │              │   │ LLM Server       │    │
+│         │ CycloneDDS     │              │   │ (llama.cpp/       │    │
+│  ┌──────┴───────────┐    │   Wi-Fi 6/7  │   │  TensorRT-LLM/   │    │
+│  │ Development CPU   │◄──┼──────────────┼──►│  Ollama)          │    │
+│  │ Jetson Orin NX    │   │   or 10GbE   │   └──────────────────┘    │
+│  │ 192.168.123.164   │   │              │                           │
+│  │ 100 TOPS          │   │              │   ┌──────────────────┐    │
+│  └──────────────────┘    │              │   │ Training Server  │    │
+│                          │              │   │ (Isaac Lab /      │    │
+│  Sensors:                │              │   │  PyTorch /        │    │
+│  - D435i (RGB-D)         │              │   │  LeRobot)         │    │
+│  - MID360 (LiDAR)        │              │   └──────────────────┘    │
+│  - 6-axis IMU            │              │                           │
+│  - 4-mic array           │              │   ┌──────────────────┐    │
+│  - Joint encoders (500Hz)│              │   │ VLM Server       │    │
+│                          │              │   │ (vision-language) │    │
+└──────────────────────────┘              └───────────────────────────┘
+```
+
+### Connection Options
+
+| Method | Bandwidth | Latency | Best For |
+|--------|-----------|---------|----------|
+| Wi-Fi (G1 Wi-Fi 6 ↔ GB10 Wi-Fi 7) | ~1 Gbps theoretical | 5-50 ms variable | Untethered operation, API calls |
+| 10GbE Ethernet (GB10 RJ45) | 10 Gbps | <1 ms | Lab/tethered, sensor streaming |
+| QSFP (via switch) | 200 Gbps | <1 ms | Multi-robot + GB10 cluster (T4) |
+
+**Recommended:** Wi-Fi for mobile operation; 10GbE for lab/training workflows where the robot is stationary or tethered.
+
+### Network Configuration
+
+The G1 uses a fixed 192.168.123.0/24 subnet. The GB10 must join this subnet or use a router/bridge.
+
+**Option A — Direct subnet join (simplest):**
+```bash
+# On GB10, configure 10GbE or Wi-Fi interface
+sudo nmcli con add type ethernet ifname eth0 con-name g1-net \
+  ip4 192.168.123.100/24
+```
+
+**Option B — Dual-network (GB10 keeps internet access):**
+- GB10's 10GbE on 192.168.123.0/24 (to G1)
+- GB10's Wi-Fi on home network (internet access for model downloads, updates)
+
+**Option C — G1 joins GB10's network:**
+- Configure G1's Wi-Fi to connect to the same network as GB10
+- G1's internal subnet (192.168.123.0/24) remains separate for internal DDS
+
+## 3. Use Cases
+
+### 3a. LLM-Based Task Planning
+
+The G1's `learning-and-ai.md` lists "LLM-based task planning integration (firmware v3.2+)" as preliminary. The GB10 can serve as the LLM backend.
+
+**Architecture:**
+```
+User (voice/text) → G1 mic array → STT on Orin NX
+  → HTTP/gRPC to GB10 → LLM (Llama 70B+) generates plan
+  → Plan sent back to G1 → G1 executes via locomotion policy + manipulation
+```
+
+**GB10 serves an OpenAI-compatible API** via llama.cpp or Ollama:
+```bash
+# On GB10: start LLM server
+./llama-server --model ~/models/llama-70b-q4.gguf \
+  --host 0.0.0.0 --port 30000 --n-gpu-layers 99
+
+# On G1 Orin NX: call the API
+curl http://192.168.123.100:30000/v1/chat/completions \
+  -d '{"model":"llama","messages":[{"role":"user","content":"Walk to the kitchen table and pick up the red cup"}]}'
+```
+
+**Recommended models:**
+| Model | Size | Speed (est.) | Task Planning Quality |
+|-------|------|---------|----------------------|
+| Llama 3.2 3B | 3B | ~100 tok/s | Basic commands |
+| Nemotron-3-Nano 30B | 3B active | Fast | Good (built-in reasoning) |
+| Llama 3.1 70B (Q4) | 70B | ~15-20 tok/s | Strong |
+| Llama 3.1 70B (Q8) | 70B | ~10-15 tok/s | Strong, higher quality |
+
+For task planning, 1-5 second response time is acceptable — the robot doesn't need instant responses for high-level commands.
+
+### 3b. Vision-Language Models
+
+Stream camera frames from the G1's Intel RealSense D435i to the GB10 for analysis by large VLMs that can't run on the Orin NX.
+
+**Use cases:**
+- Scene understanding ("describe what you see")
+- Object identification ("find the red cup on the table")
+- Spatial reasoning ("is the path ahead clear?")
+- Anomaly detection ("does anything look unusual?")
+
+**Data path:**
+```
+G1 D435i (1920x1080 @ 30fps) → RealSense SDK on Orin NX
+  → JPEG compress → HTTP POST to GB10
+  → VLM inference → JSON response back to G1
+```
+
+**Bandwidth:** A 1080p JPEG frame is ~100-500 KB. At 5 fps sampling: ~2.5 MB/s (~20 Mbps). Well within Wi-Fi capacity.
+
+### 3c. RL Policy Training
+
+Train locomotion and manipulation policies on the GB10 using Isaac Lab or MuJoCo, then deploy to the G1.
+
+**Why GB10 over the Orin NX:**
+- Isaac Lab runs thousands of parallel environments — needs GPU memory
+- GB10's 128 GB unified memory holds large batches + simulation state
+- Blackwell GPU (6,144 CUDA cores, sm_121) accelerates physics simulation
+- CUDA 13.0 + PyTorch container `nvcr.io/nvidia/pytorch:25.11-py3`
+
+**Workflow:**
+```
+1. GB10: Train policy in Isaac Lab (unitree_rl_lab, G1-29dof config)
+2. GB10: Validate in MuJoCo (sim2sim cross-validation)
+3. GB10 → G1: Transfer trained model file (.pt / .onnx)
+4. G1 Orin NX: Deploy via unitree_sdk2_python (low-gain start)
+5. G1: Gradual gain ramp-up with tethered safety testing
+```
+
+**Key: Both systems are ARM64.** Model files trained on GB10 (ARM) deploy directly to Orin NX (ARM) without architecture-related issues.
+
+### 3d. Imitation Learning Data Processing
+
+Process teleoperation demonstration data collected on the G1 using larger models on the GB10.
+
+**Workflow:**
+```
+1. G1: Collect demonstrations via XR teleoperation (Vision Pro / Quest 3)
+2. Transfer data to GB10 (SCP over network)
+3. GB10: Train behavior cloning / diffusion policy (LeRobot, 128GB memory)
+4. GB10: Validate in simulation
+5. Transfer model to G1 for deployment
+```
+
+The GB10's 128 GB memory enables training larger LeRobot policies and processing more demonstration episodes than the Orin NX's 16 GB allows.
+
+### 3e. Multi-Modal Interaction
+
+Combine the G1's sensor suite with GB10's compute for rich interaction:
+
+```
+G1 Microphones (4-mic array) → Speech-to-Text (Orin NX: Whisper small)
+  → Text to GB10 → LLM reasoning → Response text
+  → TTS on Orin NX → G1 Speaker (5W stereo)
+
+G1 Camera (D435i) → Frames to GB10 → VLM → Situational awareness
+G1 LiDAR (MID360) → Point cloud to GB10 → Semantic mapping
+```
+
+### 3f. Simulation Environment
+
+Run full physics simulation of the G1 on the GB10:
+
+- **MuJoCo:** CPU-based simulation for policy validation
+- **Isaac Lab:** GPU-accelerated parallel environments for training
+- **Isaac Gym:** Legacy GPU simulation (unitree_rl_gym)
+
+The GB10 can run these natively (ARM64 NVIDIA stack) while the G1 operates in the real world. Sim2Real transfer between GB10 simulation and real G1 hardware.
+
+## 4. Latency Budget
+
+| Operation | Latency | Acceptable? |
+|-----------|---------|-------------|
+| G1 locomotion control loop | 2 ms (500 Hz) | N/A — stays on-robot |
+| DDS internal (loco ↔ dev) | 2 ms | N/A — stays on-robot |
+| Wi-Fi round-trip (G1 ↔ GB10) | 5-50 ms | Yes for planning, No for control |
+| 10GbE round-trip (G1 ↔ GB10) | <1 ms | Yes for most tasks |
+| LLM inference (70B Q4) | 1-5 seconds | Yes for task planning |
+| VLM inference (per frame) | 0.5-2 seconds | Yes for scene understanding |
+
+**Critical rule:** Real-time control (500 Hz locomotion, balance, joint commands) MUST stay on the G1's locomotion computer. The GB10 handles only high-level reasoning with relaxed latency requirements.
+
+## 5. Communication Protocol Options
+
+### Option A: REST/gRPC API (Simplest)
+
+GB10 runs an HTTP API server (llama.cpp, Ollama, or custom FastAPI). G1's Orin NX sends requests and receives responses.
+
+- **Pros:** Simple, stateless, easy to debug, OpenAI-compatible
+- **Cons:** Higher latency per request, no streaming sensor data
+- **Best for:** Task planning, VLM queries, one-shot inference
+
+### Option B: CycloneDDS (Native G1 protocol)
+
+Install CycloneDDS 0.10.2 on the GB10, making it appear as another node on the G1's DDS network.
+
+- **Pros:** Native integration, low latency, pub/sub for streaming data
+- **Cons:** More complex setup, must match exact DDS version (0.10.2)
+- **Best for:** Streaming sensor data, continuous perception, tight integration
+
+```bash
+# On GB10: Install CycloneDDS (must be 0.10.2 to match G1)
+pip install cyclonedds==0.10.2
+
+# Subscribe to G1 state
+# (requires unitree_sdk2 IDL definitions)
+```
+
+### Option C: ROS 2 Bridge
+
+Both systems support ROS 2 (G1 via unitree_ros2, GB10 via standard Ubuntu packages).
+
+- **Pros:** Rich ecosystem, visualization (RViz), bag recording
+- **Cons:** Additional overhead, ROS 2 setup complexity
+- **Best for:** Research workflows, multi-sensor fusion, SLAM
+
+## 6. What NOT to Use the GB10 For
+
+- **Real-time joint control** — Too much latency over network. Locomotion stays on RK3588.
+- **Balance/stability** — 500 Hz control loop cannot tolerate network jitter.
+- **Safety-critical decisions** — Emergency stop must be on-robot, not dependent on network.
+- **Field deployment brain** — GB10 requires wall power (280W). It's a lab/indoor companion only.
+
+## 7. Hardware Compatibility Notes
+
+| Aspect | G1 | GB10 | Compatible? |
+|--------|-----|------|-------------|
+| CPU architecture | ARM (A76/A55, Orin ARM) | ARM (Cortex-X925/A725) | Yes |
+| CUDA | sm_87 (Orin) | sm_121 (Blackwell) | Model files portable |
+| Linux | Custom (kernel 5.10 RT) | Ubuntu 24.04 (kernel 6.8) | Yes (userspace) |
+| Python | 3.x on Orin NX | 3.x on DGX OS | Yes |
+| DDS | CycloneDDS 0.10.2 | Not pre-installed (installable) | Yes (with setup) |
+| Docker | Available on Orin NX | Pre-installed + NVIDIA Runtime | Yes |
+| Network | Wi-Fi 6, Ethernet | Wi-Fi 7, 10GbE, QSFP | Yes |
+
+**ARM alignment is a significant advantage.** Both systems are ARM-native, eliminating cross-architecture issues with model files, shared libraries, and container images.
+
+## 8. Cost Context
+
+| Component | Price | Role |
+|-----------|-------|------|
+| G1 EDU Standard | $43,500 | Robot platform |
+| G1 EDU Ultimate B | $73,900 | Robot + full DOF + tactile |
+| Dell Pro Max GB10 (2TB) | $3,699 | Offboard AI brain |
+| Dell Pro Max GB10 (4TB) | $3,999 | Offboard AI brain |
+
+The GB10 adds <10% to the cost of a G1 EDU while multiplying AI capability by orders of magnitude. Even at the $13,500 base G1 price, the GB10 is a ~27% add-on that transforms the AI capabilities.
+
+## 9. Quick Start (Proposed)
+
+1. **Network:** Connect GB10 to G1's subnet (Wi-Fi or Ethernet to 192.168.123.100)
+2. **LLM server:** Start llama.cpp or Ollama on GB10 (port 30000 or 12000)
+3. **Test connectivity:** `curl http://192.168.123.100:30000/v1/models` from G1 Orin NX
+4. **First integration:** Simple Python script on Orin NX that sends voice-to-text to GB10 LLM, receives plan, prints it
+5. **Iterate:** Add VLM, sensor streaming, RL training as needed
+
+## Key Relationships
+
+- GB10 compute: [[gb10-superchip]], [[ai-frameworks]], [[ai-workloads]]
+- GB10 networking: [[connectivity]]
+- G1 reference: `git/unitree-g1/context/` (separate knowledge base)