Hardware to run NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)
Jun 2026. NVIDIA's frontier hybrid Mamba-2 + LatentMoE + attention with MTP — 55 B active / 550 B total, native 1 M ctx (RULER@1M 94.7). SWE-V 71.9, LCB v6 89.0, GPQA 87.0, HLE 26.7. OpenMDW-1.1 (commercial OK).
Quantization
Availability
Cheapest
2× Strix Halo cluster (256 GB unified)
AMD · mini-PC pair
$5,600
tokens / secQ2
120B-MoE 48 t/s
235B-MoE 22 t/s
671B-MoE —
Memory256 GB · 192 usable
Bandwidth256 GB/s
Idle / Active16 W / 240 W
Sticker$5,600
Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($6k USD).
Fastest
12× RTX Pro 6000 Blackwell rack (1152 GB)
NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ2
120B-MoE —
235B-MoE 312 t/s
671B-MoE 114 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: Highest measured tg/s — 125 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q2.
All-rounder
Single AMD Instinct MI355X 288 GB workstation
AMD · 4U server (OAM, liquid-cooled)
$28,000
tokens / secQ2
120B-MoE 384 t/s
235B-MoE 156 t/s
671B-MoE 60 t/s
Memory288 GB · 282 usable
Bandwidth8000 GB/s
Idle / Active140 W / 1400 W
Sticker$28,000
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
Single AMD Instinct MI325X 256 GB workstation
AMD · workstation / 4U server (OAM)
$25,000
tokens / secQ2
120B-MoE 312 t/s
235B-MoE 126 t/s
671B-MoE 34 t/s
Memory256 GB · 250 usable
Bandwidth6000 GB/s
Idle / Active100 W / 1000 W
Sticker$25,000
Why: Best $/tg-per-second — ~$496 per t/s.
📺 Reviews on YouTube
Best CUDA
8× RTX Pro 6000 Blackwell server (768 GB)
NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)
$78,000
tokens / secQ2
120B-MoE —
235B-MoE 264 t/s
671B-MoE 90 t/s
Memory768 GB · 744 usable
Bandwidth1792 GB/s
Idle / Active220 W / 4800 W
Sticker$78,000
Why: Strongest CUDA-only software stack among fitting builds.
Most VRAM
DGX B200 — 8× B200 server (1.44 TB HBM3e)
NVIDIA · 10U DGX server
$475,000
tokens / secQ2
120B-MoE 396 t/s
235B-MoE 198 t/s
671B-MoE 126 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: 1404 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient
Mac Studio M3 Ultra 256 GB
Apple · small desktop
$7,999
tokens / secQ2
120B-MoE 66 t/s
235B-MoE 26 t/s
671B-MoE —
Memory256 GB · 232 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$7,999
Why: 180 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Cheapest
Mac Studio M3 Ultra 256 GB
Apple · small desktop
$7,999
tokens / secQ2
120B-MoE 66 t/s
235B-MoE 26 t/s
671B-MoE —
Memory256 GB · 232 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$7,999
Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($8k USD).
📺 Reviews on YouTube
Fastest
Single AMD Instinct MI355X 288 GB workstation
AMD · 4U server (OAM, liquid-cooled)
$28,000
tokens / secQ2
120B-MoE 384 t/s
235B-MoE 156 t/s
671B-MoE 60 t/s
Memory288 GB · 282 usable
Bandwidth8000 GB/s
Idle / Active140 W / 1400 W
Sticker$28,000
Why: Highest measured tg/s — 62 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q2.
All-rounder
Single AMD Instinct MI325X 256 GB workstation
AMD · workstation / 4U server (OAM)
$25,000
tokens / secQ2
120B-MoE 312 t/s
235B-MoE 126 t/s
671B-MoE 34 t/s
Memory256 GB · 250 usable
Bandwidth6000 GB/s
Idle / Active100 W / 1000 W
Sticker$25,000
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
📺 Reviews on YouTube
Best value
Dual RTX Pro 6000 Blackwell build
NVIDIA · workstation
$24,400
tokens / secQ2
120B-MoE —
235B-MoE 108 t/s
671B-MoE —
Memory192 GB · 188 usable
Bandwidth1792 GB/s
Idle / Active50 W / 1100 W
Sticker$24,400
Why: Best $/tg-per-second — ~$565 per t/s.
📺 Reviews on YouTube
Most VRAM
Mac Studio M3 Ultra 512 GB
Apple · small desktop
$14,199
tokens / secQ2
120B-MoE 66 t/s
235B-MoE 26 t/s
671B-MoE 9.6 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: 480 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient
Single AMD Instinct MI300X 192 GB workstation
AMD · workstation
$30,000
tokens / secQ2
120B-MoE 240 t/s
235B-MoE 96 t/s
671B-MoE —
Memory192 GB · 188 usable
Bandwidth5300 GB/s
Idle / Active90 W / 750 W
Sticker$30,000
Why: 750 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)
10 additional builds fit NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) at Q2_K (187 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q2) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect | $10k | 256 / 240 GB | 273 GB/s | 12 t/s | 460 W | $1.7k |
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower | $11k | 256 / 248 GB | 608 GB/s | 17 t/s | 1450 W | $5k |
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric | $12k | 512 / 384 GB | 256 GB/s | 12 t/s | 480 W | $1.7k |
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops | $20k | 512 / 488 GB | 273 GB/s | 15 t/s | 920 W | $3.4k |
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric | $23k | 1024 / 768 GB | 256 GB/s | 15 t/s | 960 W | $3.4k |
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA | $28k | 1024 / 960 GB | 819 GB/s | 14 t/s | 440 W | $1.5k |
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal | $38k | 384 / 372 GB | 1792 GB/s | 70 t/s | 2200 W | $8k |
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric | $44k | 1024 / 976 GB | 273 GB/s | 20 t/s | 1840 W | $7k |
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 53 t/s | 5600 W | $20k |
DGX H200 — 8× H200 server (1.13 TB HBM3e)NVIDIA · 8U DGX / HGX server rack | $380k | 1128 / 1100 GB | 4800 GB/s | 74 t/s | 6500 W | $24k |
Cheapest
4× Strix Halo cluster (512 GB unified)
AMD · rack of 4 mini-PCs, 10 GbE fabric
$11,500
tokens / secQ4
120B-MoE 55 t/s
235B-MoE 24 t/s
671B-MoE 5.0 t/s
Memory512 GB · 384 usable
Bandwidth256 GB/s
Idle / Active32 W / 480 W
Sticker$11,500
Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($12k USD).
Fastest
12× RTX Pro 6000 Blackwell rack (1152 GB)
NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ4
120B-MoE —
235B-MoE 260 t/s
671B-MoE 95 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: Highest measured tg/s — 104 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q4.
All-rounder
Quad RTX Pro 6000 Blackwell build (384 GB)
NVIDIA · workstation / 4U pedestal
$38,000
tokens / secQ4
120B-MoE —
235B-MoE 145 t/s
671B-MoE 40 t/s
Memory384 GB · 372 usable
Bandwidth1792 GB/s
Idle / Active100 W / 2200 W
Sticker$38,000
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
8× RTX Pro 6000 Blackwell server (768 GB)
NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)
$78,000
tokens / secQ4
120B-MoE —
235B-MoE 220 t/s
671B-MoE 75 t/s
Memory768 GB · 744 usable
Bandwidth1792 GB/s
Idle / Active220 W / 4800 W
Sticker$78,000
Why: Best $/tg-per-second — ~$886 per t/s.
Best CUDA
DGX B200 — 8× B200 server (1.44 TB HBM3e)
NVIDIA · 10U DGX server
$475,000
tokens / secQ4
120B-MoE 330 t/s
235B-MoE 165 t/s
671B-MoE 105 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
DGX H200 — 8× H200 server (1.13 TB HBM3e)
NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ4
120B-MoE 310 t/s
235B-MoE 155 t/s
671B-MoE 100 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: 1100 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient
Mac Studio M3 Ultra 512 GB
Apple · small desktop
$14,199
tokens / secQ4
120B-MoE 55 t/s
235B-MoE 22 t/s
671B-MoE 8.0 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: 220 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Cheapest
Mac Studio M3 Ultra 512 GB
Apple · small desktop
$14,199
tokens / secQ4
120B-MoE 55 t/s
235B-MoE 22 t/s
671B-MoE 8.0 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($14k USD).
📺 Reviews on YouTube
Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)
5 additional builds fit NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) at Q4_K_M (340 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q4) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops | $20k | 512 / 488 GB | 273 GB/s | 13 t/s | 920 W | $3.4k |
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric | $23k | 1024 / 768 GB | 256 GB/s | 13 t/s | 960 W | $3.4k |
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA | $28k | 1024 / 960 GB | 819 GB/s | 12 t/s | 440 W | $1.5k |
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric | $44k | 1024 / 976 GB | 273 GB/s | 17 t/s | 1840 W | $7k |
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 44 t/s | 5600 W | $20k |
Cheapest
Mac Studio M3 Ultra 512 GB
Apple · small desktop
$14,199
tokens / secQ5
120B-MoE 46 t/s
235B-MoE 18 t/s
671B-MoE 6.7 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($14k USD).
📺 Reviews on YouTube
Fastest
12× RTX Pro 6000 Blackwell rack (1152 GB)
NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ5
120B-MoE —
235B-MoE 218 t/s
671B-MoE 80 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: Highest measured tg/s — 87 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q5.
All-rounder
8× RTX Pro 6000 Blackwell server (768 GB)
NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)
$78,000
tokens / secQ5
120B-MoE —
235B-MoE 185 t/s
671B-MoE 63 t/s
Memory768 GB · 744 usable
Bandwidth1792 GB/s
Idle / Active220 W / 4800 W
Sticker$78,000
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
4× DGX Spark cluster (512 GB unified, CUDA)
NVIDIA · rack of 4 desktops
$19,500
tokens / secQ5
120B-MoE 96 t/s
235B-MoE 27 t/s
671B-MoE 9.2 t/s
Memory512 GB · 488 usable
Bandwidth273 GB/s
Idle / Active110 W / 920 W
Sticker$19,500
Why: Best $/tg-per-second — ~$1,814 per t/s.
📺 Reviews on YouTube
Best CUDA
DGX B200 — 8× B200 server (1.44 TB HBM3e)
NVIDIA · 10U DGX server
$475,000
tokens / secQ5
120B-MoE 277 t/s
235B-MoE 139 t/s
671B-MoE 88 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
DGX H200 — 8× H200 server (1.13 TB HBM3e)
NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ5
120B-MoE 260 t/s
235B-MoE 130 t/s
671B-MoE 84 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: 1100 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)
Apple · two desktops, Thunderbolt 5 RDMA
$28,400
tokens / secQ5
120B-MoE 60 t/s
235B-MoE 24 t/s
671B-MoE 7.6 t/s
Memory1024 GB · 960 usable
Bandwidth819 GB/s
Idle / Active24 W / 440 W
Sticker$28,400
Why: 440 W active — lowest power draw of the fitting builds.
Cheapest
Mac Studio M3 Ultra 512 GB
Apple · small desktop
$14,199
tokens / secQ5
120B-MoE 46 t/s
235B-MoE 18 t/s
671B-MoE 6.7 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($14k USD).
📺 Reviews on YouTube
Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)
3 additional builds fit NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) at Q5_K_M (405 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q5) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric | $23k | 1024 / 768 GB | 256 GB/s | 11 t/s | 960 W | $3.4k |
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric | $44k | 1024 / 976 GB | 273 GB/s | 14 t/s | 1840 W | $7k |
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 37 t/s | 5600 W | $20k |
Cheapest
8× Strix Halo cluster (1024 GB unified)
AMD · rack of 8 mini-PCs, 10/25 GbE fabric
$23,200
tokens / secQ8
120B-MoE 36 t/s
235B-MoE 17 t/s
671B-MoE 4.2 t/s
Memory1024 GB · 768 usable
Bandwidth256 GB/s
Idle / Active64 W / 960 W
Sticker$23,200
Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($23k USD).
Fastest
12× RTX Pro 6000 Blackwell rack (1152 GB)
NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ8
120B-MoE —
235B-MoE 135 t/s
671B-MoE 49 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: Highest measured tg/s — 54 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q8.
All-rounder
8× RTX Pro 6000 Blackwell server (768 GB)
NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)
$78,000
tokens / secQ8
120B-MoE —
235B-MoE 114 t/s
671B-MoE 39 t/s
Memory768 GB · 744 usable
Bandwidth1792 GB/s
Idle / Active220 W / 4800 W
Sticker$78,000
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)
Apple · two desktops, Thunderbolt 5 RDMA
$28,400
tokens / secQ8
120B-MoE 37 t/s
235B-MoE 15 t/s
671B-MoE 4.7 t/s
Memory1024 GB · 960 usable
Bandwidth819 GB/s
Idle / Active24 W / 440 W
Sticker$28,400
Why: Best $/tg-per-second — ~$4,708 per t/s.
Best CUDA
DGX B200 — 8× B200 server (1.44 TB HBM3e)
NVIDIA · 10U DGX server
$475,000
tokens / secQ8
120B-MoE 172 t/s
235B-MoE 86 t/s
671B-MoE 55 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
DGX H200 — 8× H200 server (1.13 TB HBM3e)
NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ8
120B-MoE 161 t/s
235B-MoE 81 t/s
671B-MoE 52 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: 1100 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient
8× DGX Spark cluster (1024 GB unified, CUDA)
NVIDIA · rack of 8 desktops, 200 GbE fabric
$43,500
tokens / secQ8
120B-MoE 78 t/s
235B-MoE 22 t/s
671B-MoE 8.3 t/s
Memory1024 GB · 976 usable
Bandwidth273 GB/s
Idle / Active220 W / 1840 W
Sticker$43,500
Why: 1840 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
No plug-and-play build fits at Q8_0
Only used / DIY / homelab-cluster rigs fit NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) at this quant. Turn off "Only plug & play" to see them.
Sources
Last updated 2026-06-13