# Kinematic Planner ONNX Model Reference This page provides a detailed specification of the **Kinematic Planner** ONNX model inputs and outputs. The kinematic planner is the core motion generation component of the GEAR-SONIC system: given the robot's current state and high-level navigation commands, it produces a sequence of future whole-body poses (MuJoCo `qpos` frames) that the low-level whole-body controller then tracks. The ONNX model is part of the **C++ inference stack** and is called by the deployment runtime during operation. The C++ stack manages input construction, timing, and state management — certain combinations of inputs are invalid and are handled by the C++ layer to ensure safe operation. This page is intended for developers who want to understand the model interface at a deeper level or build custom integrations beyond the standard deployment pipeline. ```{admonition} Training Code & Technical Report :class: note The kinematic planner training code and technical report will be released soon. This page documents the ONNX model interface for deployment integration. ``` --- ## Overview The planner takes **11 input tensors** and produces **2 output tensors**. The 6 primary inputs are listed below; the remaining 5 are advanced inputs managed by the C++ stack and should not need to be modified in most cases. **Primary inputs:** | Tensor Name | Shape | Dtype | Default | |-------------|-------|-------|---------| | `context_mujoco_qpos` | `[1, 4, 36]` | `float32` | Required | | `target_vel` | `[1]` | `float32` | `-1.0` (use mode default velocity) | | `mode` | `[1]` | `int64` | Required | | `movement_direction` | `[1, 3]` | `float32` | Required | | `facing_direction` | `[1, 3]` | `float32` | Required | | `height` | `[1]` | `float32` | `-1.0` (disable height control) | **Outputs:** | Tensor Name | Shape | Dtype | |-------------|-------|-------| | `mujoco_qpos` | `[1, N, 36]` | `float32` | | `num_pred_frames` | scalar | `int64` | Where: - **K** = `max_tokens - min_tokens + 1` (model-dependent; the range of allowed prediction horizons) - **N** = maximum number of output frames (padded); only the first `num_pred_frames` frames are valid --- ## Coordinate System The model operates in **MuJoCo's Z-up coordinate convention**: - **X** — forward - **Y** — left - **Z** — up All position and direction vectors in the inputs and outputs follow this convention. --- ## Input Tensors (context_mujoco_qpos)= ### `context_mujoco_qpos` | Property | Value | |----------|-------| | **Shape** | `[1, 4, 36]` | | **Dtype** | `float32` | | **Description** | The planner's context input consisting of 4 consecutive MuJoCo `qpos` frames representing the recent states of the robot | This is the primary context input. It provides 4 frames of the robot's recent joint configuration at the simulation framerate. The 36 dimensions of each frame are the standard MuJoCo `qpos` vector for the Unitree G1 (29-DOF) model: | Index | Field | Description | |-------|-------|-------------| | 0–2 | Root position | `(x, y, z)` in meters, Z-up world frame | | 3–6 | Root quaternion | `(w, x, y, z)` orientation — MuJoCo convention | | 7–35 | DOF positions | 29 joint angles in radians, following MuJoCo body tree order | ```{admonition} Coordinate Frame :class: note All inputs — including `context_mujoco_qpos`, `movement_direction`, `facing_direction`, `specific_target_positions`, and `specific_target_headings` — should be provided in the **world coordinate frame**. The root quaternion uses MuJoCo's `(w, x, y, z)` ordering at indices 3 to 6. The model handles canonicalization internally. ``` ### `target_vel` | Property | Value | |----------|-------| | **Shape** | `[1]` | | **Dtype** | `float32` | | **Description** | Desired locomotion speed override | Controls the target movement speed. When set to **zero or below** (e.g., `-1.0`), the model uses the default velocity for the selected mode. When set to a **positive value**, it overrides the mode's default speed (in meters per second). Note that the actual achieved speed may differ from the target due to the critically damped spring model and motion dynamics. | Value | Behavior | |-------|----------| | `<= 0.0` | Use the default velocity for the selected `mode` | | `> 0.0` | Override with this target velocity (m/s) | ### `mode` | Property | Value | |----------|-------| | **Shape** | `[1]` | | **Dtype** | `int64` | | **Description** | Index selecting the motion style/behavior | Selects the motion style from the pre-loaded clip library. The mode index is clamped to the number of available clips at runtime. The default planner ships with the following modes: **Locomotion set:** | Index | Mode | Description | |-------|------|-------------| | 0 | `idle` | Standing still | | 1 | `slowWalk` | Slow forward locomotion | | 2 | `walk` | Normal walking speed | | 3 | `run` | Running | **Squat / ground set:** | Index | Mode | Description | |-------|------|-------------| | 4 | `squat` | Squatting — requires `height` input (range ~0.4–0.8m) | | 5 | `kneelTwoLeg` | Kneeling on both knees — requires `height` input (0.2m-0.4m) | | 6 | `kneelOneLeg` | Kneeling on one knee — requires `height` input (0.2m-0.4m) | | 7 | `lyingFacedown` | Lying face down — requires `height` input | | 8 | `handCrawling` | Crawling on hands and knees | | 14 | `elbowCrawling` | Crawling on elbows (more likely to overheat) | **Boxing set:** | Index | Mode | Description | |-------|------|-------------| | 9 | `idleBoxing` | Boxing stance (idle) | | 10 | `walkBoxing` | Walking with boxing guard | | 11 | `leftJab` | Left jab | | 12 | `rightJab` | Right jab | | 13 | `randomPunches` | Random punch sequence | | 15 | `leftHook` | Left hook | | 16 | `rightHook` | Right hook | **Style walks:** | Index | Mode | Description | |-------|------|-------------| | 17 | `happy` | Happy walking | | 18 | `stealth` | Stealthy walking | | 19 | `injured` | Limping walk | | 20 | `careful` | Cautious walking | | 21 | `objectCarrying` | Walking with hands reaching out | | 22 | `crouch` | Crouched walking | | 23 | `happyDance` | Dancing walk (only walk forward) | | 24 | `zombie` | Zombie walk | | 25 | `point` | Walking with hands pointing | | 26 | `scared` | Scared walk | ### `movement_direction` | Property | Value | |----------|-------| | **Shape** | `[1, 3]` | | **Dtype** | `float32` | | **Description** | Desired direction of movement in the MuJoCo world frame | A 3D direction vector `(x, y, z)` in the Z-up world coordinate system indicating where the robot should move. It is recommended to pass a normalized vector for good practice, though the model normalizes internally. Speed is controlled by `target_vel` and `mode`, not by the magnitude of this vector. - The planner uses the `(x, y)` components (horizontal plane) for computing the target root trajectory via a critically-damped spring model. - When the magnitude is near zero (`< 1e-5`), the model falls back to using the `facing_direction` with a small scaling factor for in-place turning. ### `facing_direction` | Property | Value | |----------|-------| | **Shape** | `[1, 3]` | | **Dtype** | `float32` | | **Description** | Desired facing (heading) direction in the MuJoCo world frame | A 3D direction vector `(x, y, z)` indicating which direction the robot's torso should face. The target heading angle is computed as `atan2(y, x)` from this vector. Like `movement_direction`, this does not need to be normalized. This is independent of `movement_direction` — the robot can walk in one direction while facing another (e.g., strafing). ### `height` | Property | Value | |----------|-------| | **Shape** | `[1]` | | **Dtype** | `float32` | | **Description** | Desired root height for height-aware behaviors | Controls the target pelvis height for modes that support variable height (e.g., `squat`, `kneelTwoLeg`, `kneelOneLeg`, `lyingFacedown`). When a positive value is provided, the model searches the reference clip's keyframes and selects the one whose root height is closest to the requested value, using it as the target pose for motion generation. | Value | Behavior | |-------|----------| | `< 0.0` | Height control disabled; use the randomly-selected keyframe from the reference clip | | `>= 0.0` | Find the closest height keyframe in the reference clip and use it as the target pose (meters) |
## Advanced Inputs These inputs are managed internally by the C++ deployment stack and **should not be modified** under normal operation. They are documented here for completeness and for advanced users who need to build custom integrations. ### `random_seed` | Property | Value | |----------|-------| | **Shape** | `[1]` | | **Dtype** | `int64` | | **Description** | Seed for controlling network randomness | ### `has_specific_target` | Property | Value | |----------|-------| | **Shape** | `[1, 1]` | | **Dtype** | `int64` | | **Description** | Flag indicating whether specific waypoint targets are provided | | Value | Behavior | |-------|----------| | `0` | Ignore `specific_target_positions` and `specific_target_headings`; use `movement_direction` / `facing_direction` | | `1` | Use the provided specific target positions and headings as waypoint constraints | When enabled, the spring model's target root position and heading are overridden by the values in `specific_target_positions` and `specific_target_headings`. ### `specific_target_positions` | Property | Value | |----------|-------| | **Shape** | `[1, 4, 3]` | | **Dtype** | `float32` | | **Description** | 4 waypoint positions in MuJoCo world coordinates | Each waypoint is a 3D position `(x, y, z)` in the Z-up world frame. The 4 waypoints correspond to 4 frames (one token's worth) of target root positions. Only used when `has_specific_target = 1`. ### `specific_target_headings` | Property | Value | |----------|-------| | **Shape** | `[1, 4]` | | **Dtype** | `float32` | | **Description** | 4 waypoint heading angles in radians | Target heading (yaw) angles for each of the 4 waypoint frames. These are absolute angles in the Z-up world frame, measured as rotation around the Z-axis. Only used when `has_specific_target = 1`. The last waypoint's heading (`[:, -1]`) is used as the primary target heading for the spring model. ### `allowed_pred_num_tokens` | Property | Value | |----------|-------| | **Shape** | `[1, K]` where `K = max_tokens - min_tokens + 1` | | **Dtype** | `int64` | | **Description** | Binary mask controlling the allowed prediction horizon | A binary mask where each element corresponds to a possible number of predicted tokens. Index `i` maps to `min_tokens + i` tokens. A value of `1` means that prediction length is allowed; `0` means it is disallowed. Since each token represents 4 frames, the prediction horizon in frames is `num_tokens * 4`. In our default planner we have `min_tokens = 6` and `max_tokens = 16`: | Index | Tokens | Frames | |-------|--------|--------| | 0 | 6 | 24 | | 1 | 7 | 28 | | 2 | 8 | 32 | | 3 | 9 | 36 | | 4 | 10 | 40 | | 5 | 11 | 44 | | 6 | 12 | 48 | | 7 | 13 | 52 | | 8 | 14 | 56 | | 9 | 15 | 60 | | 10 | 16 | 64 | --- ## Output Tensors ### `mujoco_qpos` | Property | Value | |----------|-------| | **Shape** | `[1, N, 36]` | | **Dtype** | `float32` | | **Description** | Predicted motion sequence as MuJoCo `qpos` frames | The primary output: a sequence of whole-body pose frames in the same 36-dimensional MuJoCo `qpos` format as the input (see {ref}`context_mujoco_qpos