All models

Hardware to run NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

Jun 2026. NVIDIA's frontier hybrid Mamba-2 + LatentMoE + attention with MTP — 55 B active / 550 B total, native 1 M ctx (RULER@1M 94.7). SWE-V 71.9, LCB v6 89.0, GPQA 87.0, HLE 26.7. OpenMDW-1.1 (commercial OK).

Nemotron · text
NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)
550 B params 300 GB Q4 file 340 GB min Q4 405 GB min Q5 653 GB min Q8 1000K ctx OpenMDW-1.1 🤗
switch in the live picker →
Quantization
Availability
Cheapest

4× Strix Halo cluster (512 GB unified)

AMD · rack of 4 mini-PCs, 10 GbE fabric
$11,500
tokens / secQ4
120B-MoE 55 t/s
235B-MoE 24 t/s
671B-MoE 5.0 t/s
Memory512 GB · 384 usable
Bandwidth256 GB/s
Idle / Active32 W / 480 W
Sticker$11,500
Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($12k USD).
Fastest

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ4
120B-MoE
235B-MoE 260 t/s
671B-MoE 95 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: Highest measured tg/s — 104 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q4.
All-rounder

Quad RTX Pro 6000 Blackwell build (384 GB)

NVIDIA · workstation / 4U pedestal
$38,000
tokens / secQ4
120B-MoE
235B-MoE 145 t/s
671B-MoE 40 t/s
Memory384 GB · 372 usable
Bandwidth1792 GB/s
Idle / Active100 W / 2200 W
Sticker$38,000
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value

8× RTX Pro 6000 Blackwell server (768 GB)

NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)
$78,000
tokens / secQ4
120B-MoE
235B-MoE 220 t/s
671B-MoE 75 t/s
Memory768 GB · 744 usable
Bandwidth1792 GB/s
Idle / Active220 W / 4800 W
Sticker$78,000
Why: Best $/tg-per-second — ~$886 per t/s.
Best CUDA

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server
$475,000
tokens / secQ4
120B-MoE 330 t/s
235B-MoE 165 t/s
671B-MoE 105 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Strongest CUDA-only software stack among fitting builds.
Most VRAM

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ4
120B-MoE 310 t/s
235B-MoE 155 t/s
671B-MoE 100 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: 1100 GB usable — most headroom for batching and longer contexts.
Efficient

Mac Studio M3 Ultra 512 GB

Apple · small desktop
$14,199
tokens / secQ4
120B-MoE 55 t/s
235B-MoE 22 t/s
671B-MoE 8.0 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: 220 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop
$14,199
tokens / secQ4
120B-MoE 55 t/s
235B-MoE 22 t/s
671B-MoE 8.0 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($14k USD).
📺 Reviews on YouTube

Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

5 additional builds fit NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) at Q4_K_M (340 GB usable minimum), sorted by sticker price.

BuildPriceMemoryBandwidthtg/s (Q4)Active W5-yr power
$20k512 / 488 GB273 GB/s13 t/s920 W$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric
$23k1024 / 768 GB256 GB/s13 t/s960 W$3.4k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA
$28k1024 / 960 GB819 GB/s12 t/s440 W$1.5k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric
$44k1024 / 976 GB273 GB/s17 t/s1840 W$7k
8× H100 80 GB serverNVIDIA · server rack
$280k640 / 620 GB3350 GB/s44 t/s5600 W$20k
Open in the live picker (Q2 / Q5 / Q8 toggles) → Compare NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) against other LLMs → Pick LLMs for your hardware → Submit a benchmark for NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ↗

Sources

Last updated 2026-06-13