Hardware to run NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

Jun 2026. NVIDIA's frontier hybrid Mamba-2 + LatentMoE + attention with MTP — 55 B active / 550 B total, native 1 M ctx (RULER@1M 94.7). SWE-V 71.9, LCB v6 89.0, GPQA 87.0, HLE 26.7. OpenMDW-1.1 (commercial OK).

Nemotron · text

NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

550 B params 300 GB Q4 file 340 GB min Q4 405 GB min Q5 653 GB min Q8 1000K ctx OpenMDW-1.1 🤗

switch in the live picker →

Quantization

Cheapest

2× Strix Halo cluster (256 GB unified)

AMD · mini-PC pair

$5,600

tokens / secQ2

120B-MoE 48 t/s

235B-MoE 22 t/s

671B-MoE —

Memory256 GB · 192 usable

Bandwidth256 GB/s

Idle / Active16 W / 240 W

Sticker$5,600

Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($6k USD).

Amazon (Strix Halo Mini PC) ↗GMKtec ↗Framework ↗

📺 Reviews on YouTube

▶ AMD Strix Halo / Ryzen AI Max+ 395 — honest review (single-node baseline)

▶ Kimi-K2 (1T) / GLM 4.7 (355B) on a 4-node Strix Halo cluster — 512 GB unified memory

Fastest

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ2

120B-MoE —

235B-MoE 312 t/s

671B-MoE 114 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: Highest measured tg/s — 125 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q2.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

All-rounder

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ2

120B-MoE 384 t/s

235B-MoE 156 t/s

671B-MoE 60 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

Best value

Single AMD Instinct MI325X 256 GB workstation

AMD · workstation / 4U server (OAM)

$25,000

tokens / secQ2

120B-MoE 312 t/s

235B-MoE 126 t/s

671B-MoE 34 t/s

Memory256 GB · 250 usable

Bandwidth6000 GB/s

Idle / Active100 W / 1000 W

Sticker$25,000

Why: Best $/tg-per-second — ~$496 per t/s.

AMD direct ↗Supermicro UBB ↗TensorWave (cloud) ↗

📺 Reviews on YouTube

▶ Inside the AMD Instinct MI325X — AI / HPC deep dive

Best CUDA

8× RTX Pro 6000 Blackwell server (768 GB)

NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)

$78,000

tokens / secQ2

120B-MoE —

235B-MoE 264 t/s

671B-MoE 90 t/s

Memory768 GB · 744 usable

Bandwidth1792 GB/s

Idle / Active220 W / 4800 W

Sticker$78,000

Why: Strongest CUDA-only software stack among fitting builds.

Amazon ↗PNY Pro ↗CoreWeave ↗SuperMicro ↗

Most VRAM

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ2

120B-MoE 396 t/s

235B-MoE 198 t/s

671B-MoE 126 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: 1404 GB usable — most headroom for batching and longer contexts.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

Efficient

Mac Studio M3 Ultra 256 GB

Apple · small desktop

$7,999

tokens / secQ2

120B-MoE 66 t/s

235B-MoE 26 t/s

671B-MoE —

Memory256 GB · 232 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$7,999

Why: 180 W active — lowest power draw of the fitting builds.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Cheapest

Mac Studio M3 Ultra 256 GB

Apple · small desktop

$7,999

tokens / secQ2

120B-MoE 66 t/s

235B-MoE 26 t/s

671B-MoE —

Memory256 GB · 232 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$7,999

Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($8k USD).

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ2

120B-MoE 384 t/s

235B-MoE 156 t/s

671B-MoE 60 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 62 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q2.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

All-rounder

Single AMD Instinct MI325X 256 GB workstation

AMD · workstation / 4U server (OAM)

$25,000

tokens / secQ2

120B-MoE 312 t/s

235B-MoE 126 t/s

671B-MoE 34 t/s

Memory256 GB · 250 usable

Bandwidth6000 GB/s

Idle / Active100 W / 1000 W

Sticker$25,000

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

AMD direct ↗Supermicro UBB ↗TensorWave (cloud) ↗

📺 Reviews on YouTube

▶ Inside the AMD Instinct MI325X — AI / HPC deep dive

Best value

Dual RTX Pro 6000 Blackwell build

NVIDIA · workstation

$24,400

tokens / secQ2

120B-MoE —

235B-MoE 108 t/s

671B-MoE —

Memory192 GB · 188 usable

Bandwidth1792 GB/s

Idle / Active50 W / 1100 W

Sticker$24,400

Why: Best $/tg-per-second — ~$565 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗

📺 Reviews on YouTube

▶ RTX Pro 6000 Blackwell — Linus Tech Tips review (single-card baseline)

Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ2

120B-MoE 66 t/s

235B-MoE 26 t/s

671B-MoE 9.6 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 480 GB usable — most headroom for batching and longer contexts.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Efficient

Single AMD Instinct MI300X 192 GB workstation

AMD · workstation

$30,000

tokens / secQ2

120B-MoE 240 t/s

235B-MoE 96 t/s

671B-MoE —

Memory192 GB · 188 usable

Bandwidth5300 GB/s

Idle / Active90 W / 750 W

Sticker$30,000

Why: 750 W active — lowest power draw of the fitting builds.

ASRock Rack ↗Hot Aisle (cloud) ↗AMD direct ↗

📺 Reviews on YouTube

▶ AMD MI300X server review 8x GPUs | Llama 405b model tested

Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

10 additional builds fit NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) at Q2_K (187 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q2)	Active W	5-yr power
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect	$10k	256 / 240 GB	273 GB/s	12 t/s	460 W	$1.7k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower	$11k	256 / 248 GB	608 GB/s	17 t/s	1450 W	$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric	$12k	512 / 384 GB	256 GB/s	12 t/s	480 W	$1.7k
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	15 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	15 t/s	960 W	$3.4k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	14 t/s	440 W	$1.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	70 t/s	2200 W	$8k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	20 t/s	1840 W	$7k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	53 t/s	5600 W	$20k
DGX H200 — 8× H200 server (1.13 TB HBM3e)NVIDIA · 8U DGX / HGX server rack	$380k	1128 / 1100 GB	4800 GB/s	74 t/s	6500 W	$24k

Cheapest

4× Strix Halo cluster (512 GB unified)

AMD · rack of 4 mini-PCs, 10 GbE fabric

$11,500

tokens / secQ4

120B-MoE 55 t/s

235B-MoE 24 t/s

671B-MoE 5.0 t/s

Memory512 GB · 384 usable

Bandwidth256 GB/s

Idle / Active32 W / 480 W

Sticker$11,500

Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($12k USD).

Amazon (Strix Halo Mini PC) ↗GMKtec EVO-X2 ↗Bosgame M5 ↗Framework ↗

📺 Reviews on YouTube

▶ Kimi-K2 (1T) / GLM 4.7 (355B) on a 4-node Strix Halo cluster — 512 GB unified memory

▶ AMD Strix Halo / Ryzen AI Max+ 395 — honest review (single-node baseline)

Fastest

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ4

120B-MoE —

235B-MoE 260 t/s

671B-MoE 95 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: Highest measured tg/s — 104 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q4.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

All-rounder

Quad RTX Pro 6000 Blackwell build (384 GB)

NVIDIA · workstation / 4U pedestal

$38,000

tokens / secQ4

120B-MoE —

235B-MoE 145 t/s

671B-MoE 40 t/s

Memory384 GB · 372 usable

Bandwidth1792 GB/s

Idle / Active100 W / 2200 W

Sticker$38,000

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Newegg ↗B&H Photo ↗PNY Pro ↗

Best value

8× RTX Pro 6000 Blackwell server (768 GB)

NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)

$78,000

tokens / secQ4

120B-MoE —

235B-MoE 220 t/s

671B-MoE 75 t/s

Memory768 GB · 744 usable

Bandwidth1792 GB/s

Idle / Active220 W / 4800 W

Sticker$78,000

Why: Best $/tg-per-second — ~$886 per t/s.

Amazon ↗PNY Pro ↗CoreWeave ↗SuperMicro ↗

Best CUDA

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ4

120B-MoE 330 t/s

235B-MoE 165 t/s

671B-MoE 105 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

Most VRAM

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ4

120B-MoE 310 t/s

235B-MoE 155 t/s

671B-MoE 100 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: 1100 GB usable — most headroom for batching and longer contexts.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Efficient

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ4

120B-MoE 55 t/s

235B-MoE 22 t/s

671B-MoE 8.0 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 220 W active — lowest power draw of the fitting builds.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ4

120B-MoE 55 t/s

235B-MoE 22 t/s

671B-MoE 8.0 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($14k USD).

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

5 additional builds fit NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) at Q4_K_M (340 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q4)	Active W	5-yr power
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	13 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	13 t/s	960 W	$3.4k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	12 t/s	440 W	$1.5k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	17 t/s	1840 W	$7k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	44 t/s	5600 W	$20k

Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ5

120B-MoE 46 t/s

235B-MoE 18 t/s

671B-MoE 6.7 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($14k USD).

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Fastest

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ5

120B-MoE —

235B-MoE 218 t/s

671B-MoE 80 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: Highest measured tg/s — 87 t/s on NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)-class models at Q5.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

All-rounder

8× RTX Pro 6000 Blackwell server (768 GB)

NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)

$78,000

tokens / secQ5

120B-MoE —

235B-MoE 185 t/s

671B-MoE 63 t/s

Memory768 GB · 744 usable

Bandwidth1792 GB/s

Idle / Active220 W / 4800 W

Sticker$78,000

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗PNY Pro ↗CoreWeave ↗SuperMicro ↗

Best value

4× DGX Spark cluster (512 GB unified, CUDA)

NVIDIA · rack of 4 desktops

$19,500

tokens / secQ5

120B-MoE 96 t/s

235B-MoE 27 t/s

671B-MoE 9.2 t/s

Memory512 GB · 488 usable

Bandwidth273 GB/s

Idle / Active110 W / 920 W

Sticker$19,500

Why: Best $/tg-per-second — ~$1,814 per t/s.

Amazon (4× DGX Spark) ↗NVIDIA Marketplace (4 units) ↗NADDOD ↗

📺 Reviews on YouTube

▶ NVIDIA didn't want me to do this — 8× DGX Spark 1 TB VRAM cluster (AZisk)

Best CUDA

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ5

120B-MoE 277 t/s

235B-MoE 139 t/s

671B-MoE 88 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

Most VRAM

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ5

120B-MoE 260 t/s

235B-MoE 130 t/s

671B-MoE 84 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: 1100 GB usable — most headroom for batching and longer contexts.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Efficient

2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)

Apple · two desktops, Thunderbolt 5 RDMA

$28,400

tokens / secQ5

120B-MoE 60 t/s

235B-MoE 24 t/s

671B-MoE 7.6 t/s

Memory1024 GB · 960 usable

Bandwidth819 GB/s

Idle / Active24 W / 440 W

Sticker$28,400

Why: 440 W active — lowest power draw of the fitting builds.

eBay ↗Apple Refurbished ↗B&H Photo (used) ↗

📺 Reviews on YouTube

▶ Apple didn't have to go this hard… (4× M3 Ultra TB5 cluster)

▶ Mac Studio CLUSTER vs M3 Ultra

Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ5

120B-MoE 46 t/s

235B-MoE 18 t/s

671B-MoE 6.7 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: Lowest sticker that still fits NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ($14k USD).

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

3 additional builds fit NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) at Q5_K_M (405 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q5)	Active W	5-yr power
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	11 t/s	960 W	$3.4k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	14 t/s	1840 W	$7k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	37 t/s	5600 W	$20k

Open in the live picker (Q2 / Q5 / Q8 toggles) → Compare NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) against other LLMs → Pick LLMs for your hardware → Submit a benchmark for NVIDIA Nemotron 3 Ultra 550B-A55B (MoE) ↗

Sources

Last updated 2026-06-13

Hardware to run NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

Every other build that runs NVIDIA Nemotron 3 Ultra 550B-A55B (MoE)

No plug-and-play build fits at Q8_0

Sources