--- id: multi-unit-stacking title: "Multi-Unit Stacking" status: provisional source_sections: "Web research: WCCFTech, NVIDIA newsroom" related_topics: [connectivity, gb10-superchip, ai-workloads, memory-and-storage] key_equations: [] key_terms: [connectx-7, smartnic, qsfp, stacking, nvlink] images: [] examples: [] open_questions: - "Exact cable/interconnect required between units (QSFP type, length limits)" - "Software configuration steps for multi-unit mode" - "Performance overhead of inter-unit communication vs. single unit" - "Does stacking appear as a single device to frameworks or require explicit multi-node code?" - "Can more than 2 units be stacked?" --- # Multi-Unit Stacking Two Dell Pro Max GB10 units can be connected together to create a more powerful combined system, effectively doubling the available compute and memory. ## 1. How It Works Each Dell Pro Max GB10 has **2x QSFP 200 Gbps ports** powered by the NVIDIA ConnectX-7 SmartNIC. These ports enable direct unit-to-unit connection: - **Combined memory:** 256 GB unified (128 GB per unit) - **Combined compute:** 2 PFLOP FP4 (1 PFLOP per unit) - **Interconnect bandwidth:** Up to 400 Gbps (2x 200 Gbps QSFP) ## 2. Model Capacity | Configuration | Memory | Max Model Size (approx) | |---------------|---------|-------------------------| | Single unit | 128 GB | ~200B parameters (FP4) | | Dual stacked | 256 GB | ~400B parameters (FP4) | This enables running models like **Llama 3.1 405B** (with quantization) that would not fit in a single unit's memory. ## 3. Physical Configuration The compact form factor (150x150x51mm per unit) is designed to be **stackable** — two units can sit on top of each other on a desk, connected via short QSFP cables. ## 4. Open Areas This feature is one of the less-documented aspects of the system. Key unknowns include the exact software configuration, whether it presents as a single logical device, and inter-node communication overhead. See open questions in frontmatter. ## Key Relationships - Connected via: [[connectivity]] (QSFP/ConnectX-7 ports) - Extends capacity of: [[ai-workloads]] - Doubles resources from: [[gb10-superchip]], [[memory-and-storage]]