You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

3.0 KiB

id title status source_sections related_topics key_equations key_terms images examples open_questions
gb10-superchip NVIDIA GB10 Grace Blackwell Superchip established Web research: NVIDIA newsroom, WCCFTech, Phoronix, The Register, Arm [memory-and-storage ai-frameworks ai-workloads connectivity physical-specs] [flops-fp4 nvlink-c2c-bandwidth] [gb10 grace-blackwell superchip cortex-x925 cortex-a725 blackwell-gpu tensor-core cuda-core nvlink-c2c soc] [] [] [Exact clock speeds for CPU and GPU dies under sustained load Detailed per-precision TFLOPS breakdown (FP4/FP8/FP16/FP32/FP64) Thermal throttling behavior and sustained vs. peak performance]

NVIDIA GB10 Grace Blackwell Superchip

The GB10 is a system-on-a-chip (SoC) that combines an NVIDIA Grace CPU and an NVIDIA Blackwell GPU on a single package, connected via NVLink Chip-to-Chip (NVLink-C2C) interconnect. It is the core silicon in the Dell Pro Max GB10 and the NVIDIA DGX Spark.

1. Architecture Overview

The GB10 is composed of two distinct compute dies:

  • CPU tile: Designed by MediaTek, based on ARM architecture v9.2
  • GPU tile: Designed by NVIDIA, based on the Blackwell architecture

These are stitched together using TSMC's 2.5D advanced packaging technology and connected via NVIDIA's proprietary NVLink-C2C interconnect, which provides 600 GB/s of bidirectional bandwidth between the CPU and GPU dies.

2. CPU: Grace (ARM)

The Grace CPU portion contains 20 cores in a big.LITTLE-style configuration:

  • 10x ARM Cortex-X925 — high-performance cores
  • 10x ARM Cortex-A725 — efficiency cores

Architecture: ARMv9.2

This is the same Grace CPU lineage used in NVIDIA's data center Grace Hopper and Grace Blackwell products, adapted for desktop power envelopes.

3. GPU: Blackwell

The Blackwell GPU portion features:

  • 6,144 CUDA cores (comparable to the RTX 5070 core count)
  • 5th-generation Tensor Cores — optimized for AI inference and training
  • Peak performance: 1 PFLOP (1,000 TFLOPS) at FP4 precision

The Tensor Cores are the key differentiator for AI workloads, providing hardware acceleration for mixed-precision matrix operations used in deep learning.

The CPU and GPU communicate via NVLink Chip-to-Chip:

  • Bidirectional bandwidth: 600 GB/s
  • Enables unified coherent memory — both CPU and GPU see the same 128GB LPDDR5X pool
  • Eliminates the PCIe bottleneck found in traditional discrete GPU systems

This coherent memory architecture means there is no need to explicitly copy data between "host" and "device" memory, simplifying AI development workflows.

5. Power Envelope

  • System TDP: ~140W (from related specifications)
  • External PSU: 280W USB Type-C adapter (headroom for storage, networking, peripherals)

Key Relationships