Hardware to run DiffusionGemma 26B-A4B

Jun 10 2026. Google's experimental text-diffusion Gemma — 25.2 B MoE (3.8 B active) that denoises 256-token blocks in parallel: 1,000+ tok/s on H100, 700+ on a 5090, and it runs on 18 GB cards quantized. Trades benchmark quality for ~4× generation speed vs Gemma 4 26B-A4B.

Gemma · text

DiffusionGemma 26B-A4B

26 B params 14 GB Q4 file 18 GB min Q4 21 GB min Q5 35 GB min Q8 256K ctx Apache 2.0 🤗

switch in the live picker →

Quantization

Cheapest

Single Tesla P100 16 GB (used) build

NVIDIA · desktop tower

$600

tokens / secQ2

14B 23 t/s

30B —

70B —

Memory16 GB · 15 usable

Bandwidth732 GB/s

Idle / Active25 W / 250 W

Sticker$600

Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($600 USD).

Amazon ↗eBay (used) ↗Newegg ↗

📺 Reviews on YouTube

▶ Can a 10-Year-Old $5,700 GPU Beat a New $430 GPU? | Tesla P100 Local AI Review

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ2

14B 504 t/s

30B 324 t/s

70B 216 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 972 t/s on DiffusionGemma 26B-A4B-class models at Q2.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$3,999

tokens / secQ2

14B 84 t/s

30B 46 t/s

70B 22 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$3,999

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single RTX 3090 (used) build

NVIDIA · desktop tower

$1,750

tokens / secQ2

14B 60 t/s

30B 34 t/s

70B —

Memory24 GB · 23 usable

Bandwidth936 GB/s

Idle / Active22 W / 350 W

Sticker$1,750

Why: Best $/tg-per-second — ~$17 per t/s.

Amazon (used) ↗eBay (used) ↗Newegg ↗

📺 Reviews on YouTube

▶ Local AI server benchmark — 3090 vs dual 3060s

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-27B benchmark

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ2

14B 468 t/s

30B 300 t/s

70B 204 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ2

14B 408 t/s

30B 300 t/s

70B 204 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

MacBook Air M4 (16 GB)

Apple · laptop

$1,099

tokens / secQ2

14B 9.6 t/s

30B —

70B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active5 W / 30 W

Sticker$1,099

Why: 30 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ The budget MacBook so stubborn it survived a 44k-token test

Cheapest

Mac Mini M4 (16 GB)

Apple · mini desktop

$799

tokens / secQ2

14B 12 t/s

30B —

70B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active4 W / 50 W

Sticker$799

Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($799 USD).

Amazon ↗Apple ↗B&H Photo ↗

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ2

14B 324 t/s

30B 192 t/s

70B —

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 576 t/s on DiffusionGemma 26B-A4B-class models at Q2.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$3,999

tokens / secQ2

14B 84 t/s

30B 46 t/s

70B 22 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$3,999

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single RTX 5090 build

NVIDIA · desktop tower

$4,900

tokens / secQ2

14B 149 t/s

30B 84 t/s

70B —

Memory32 GB · 31 usable

Bandwidth1792 GB/s

Idle / Active30 W / 520 W

Sticker$4,900

Why: Best $/tg-per-second — ~$19 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗

📺 Reviews on YouTube

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-27B benchmark

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-35B-A3B benchmark

▶ Not even close — LLMs on RTX 5090 vs others (AZisk)

Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server

$47,000

tokens / secQ2

14B 270 t/s

30B 162 t/s

70B 90 t/s

Memory180 GB · 176 usable

Bandwidth8000 GB/s

Idle / Active100 W / 1000 W

Sticker$47,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA B200 partners ↗SHI enterprise ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ AI Lab: NVIDIA B200 vs GB200 explained | GPU architecture for LLMs

Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ2

14B 84 t/s

30B 46 t/s

70B 22 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 480 GB usable — most headroom for batching and longer contexts.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Efficient

MacBook Air M4 (16 GB)

Apple · laptop

$1,099

tokens / secQ2

14B 9.6 t/s

30B —

70B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active5 W / 30 W

Sticker$1,099

Why: 30 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ The budget MacBook so stubborn it survived a 44k-token test

Every other build that runs DiffusionGemma 26B-A4B

61 additional builds fit DiffusionGemma 26B-A4B at Q2_K (10 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q2)	Active W	5-yr power
Single AMD Instinct MI50 32 GB (used) buildAMD · desktop tower	$700	32 / 31 GB	1024 GB/s	—	300 W	$1.0k
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower	$750	24 / 23 GB	347 GB/s	43 t/s	250 W	$854
RTX 3060 12 GB buildNVIDIA · desktop tower	$900	12 / 11 GB	360 GB/s	—	170 W	$598
Tesla V100 32 GB SXM2 mod buildNVIDIA · desktop tower	$900	32 / 31 GB	900 GB/s	40 t/s	300 W	$1.1k
Mac Mini M4 (24 GB)Apple · mini desktop	$999	24 / 18 GB	120 GB/s	—	50 W	$177
MacBook Air M5 (16 GB)Apple · laptop	$1.1k	16 / 11 GB	153 GB/s	—	30 W	$115
Single Intel Arc B580 12 GB buildIntel · desktop tower	$1.1k	12 / 11 GB	456 GB/s	—	190 W	$664
Single AMD Radeon RX 9070 XT 16 GB buildAMD · desktop tower	$1.3k	16 / 15 GB	645 GB/s	—	304 W	$1.1k
Single Intel Arc Pro B70 buildIntel · desktop tower	$1.8k	32 / 31 GB	608 GB/s	90 t/s	220 W	$782
Mac Studio M4 Max 36 GBApple · small desktop	$2.0k	36 / 28 GB	546 GB/s	101 t/s	130 W	$453
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower	$2.0k	32 / 31 GB	640 GB/s	—	300 W	$1.1k
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower	$2.3k	128 / 122 GB	1024 GB/s	45 t/s	1200 W	$4.2k
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower	$2.7k	96 / 92 GB	347 GB/s	29 t/s	1000 W	$3.5k
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop	$2.8k	128 / 96 GB	256 GB/s	58 t/s	120 W	$420
MacBook Pro M4 Pro 48 GBApple · laptop	$2.9k	48 / 40 GB	273 GB/s	50 t/s	70 W	$246
Dual RTX 3090 (used) buildNVIDIA · desktop tower	$3.1k	48 / 46 GB	936 GB/s	126 t/s	700 W	$2.4k
MacBook Pro M5 Pro 48 GBApple · laptop	$3.2k	48 / 40 GB	307 GB/s	65 t/s	75 W	$263
Single RTX 4090 buildNVIDIA · desktop tower	$3.2k	24 / 23 GB	1008 GB/s	151 t/s	410 W	$1.4k
Dual Intel Arc Pro B70 buildIntel · desktop tower	$3.2k	64 / 62 GB	608 GB/s	126 t/s	380 W	$1.4k
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation	$3.5k	32 / 31 GB	576 GB/s	65 t/s	260 W	$920
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation	$3.7k	64 / 62 GB	640 GB/s	108 t/s	600 W	$2.1k
MacBook Pro M4 Max 64 GBApple · laptop	$4.0k	64 / 54 GB	410 GB/s	79 t/s	90 W	$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop	$4.1k	128 / 119 GB	273 GB/s	108 t/s	240 W	$887
MacBook Pro M5 Max 64 GBApple · laptop	$4.1k	64 / 54 GB	614 GB/s	108 t/s	95 W	$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop	$4.7k	128 / 119 GB	273 GB/s	108 t/s	240 W	$887
Mac Studio M4 Max 128 GBApple · small desktop	$4.7k	128 / 112 GB	546 GB/s	108 t/s	130 W	$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop	$4.7k	128 / 119 GB	273 GB/s	108 t/s	240 W	$887
ASUS Ascent GX10 (128 GB)ASUS · small desktop	$4.7k	128 / 119 GB	273 GB/s	101 t/s	240 W	$903
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation	$4.7k	48 / 46 GB	768 GB/s	86 t/s	300 W	$1.0k
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation	$5k	48 / 46 GB	864 GB/s	79 t/s	295 W	$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop	$5k	128 / 119 GB	273 GB/s	108 t/s	160 W	$650
MacBook Pro M5 Max 128 GBApple · laptop	$5k	128 / 108 GB	614 GB/s	108 t/s	95 W	$332
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair	$6k	256 / 192 GB	256 GB/s	86 t/s	240 W	$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower	$6k	128 / 124 GB	608 GB/s	137 t/s	700 W	$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower	$7k	96 / 92 GB	936 GB/s	144 t/s	1400 W	$4.9k
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation	$7k	48 / 46 GB	1344 GB/s	—	300 W	$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation	$8k	48 / 46 GB	960 GB/s	144 t/s	300 W	$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop	$8k	256 / 232 GB	819 GB/s	137 t/s	180 W	$624
Dual AMD Radeon Pro W7900 buildAMD · workstation	$9k	96 / 92 GB	864 GB/s	119 t/s	600 W	$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect	$10k	256 / 240 GB	273 GB/s	180 t/s	460 W	$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower	$10k	64 / 62 GB	1792 GB/s	324 t/s	1050 W	$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower	$11k	256 / 248 GB	608 GB/s	151 t/s	1450 W	$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric	$12k	512 / 384 GB	256 GB/s	115 t/s	480 W	$1.7k
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation	$12k	96 / 93 GB	1792 GB/s	306 t/s	600 W	$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal	$15k	144 / 138 GB	960 GB/s	198 t/s	1500 W	$5k
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	198 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	144 t/s	960 W	$3.4k
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation	$24k	192 / 188 GB	1792 GB/s	360 t/s	1100 W	$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)	$25k	256 / 250 GB	6000 GB/s	468 t/s	1000 W	$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal	$25k	144 / 138 GB	1008 GB/s	270 t/s	2200 W	$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	180 t/s	440 W	$1.5k
Single AMD Instinct MI300X 192 GB workstationAMD · workstation	$30k	192 / 188 GB	5300 GB/s	360 t/s	750 W	$2.8k
Single H100 80 GB workstationNVIDIA · workstation	$32k	80 / 78 GB	3350 GB/s	324 t/s	700 W	$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	540 t/s	2200 W	$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server	$40k	141 / 138 GB	4800 GB/s	450 t/s	700 W	$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal	$40k	192 / 184 GB	1008 GB/s	342 t/s	3200 W	$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	259 t/s	1840 W	$7k
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal	$45k	128 / 124 GB	1792 GB/s	396 t/s	2300 W	$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)	$78k	768 / 744 GB	1792 GB/s	792 t/s	4800 W	$16k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	648 t/s	5600 W	$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops	—	128 / 119 GB	300 GB/s	—	— W	$756

Cheapest

Single AMD Instinct MI50 32 GB (used) build

AMD · desktop tower

$700

tokens / secQ4

14B 38 t/s

30B —

70B —

Memory32 GB · 31 usable

Bandwidth1024 GB/s

Idle / Active18 W / 300 W

Sticker$700

Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($700 USD).

Amazon ↗eBay (used) ↗Alibaba ↗

📺 Reviews on YouTube

▶ AMD MI50 32 GB speed test — Ollama vs llama.cpp (GPT-OSS & Qwen3)

▶ Is the Radeon Instinct MI50 32 GB the ultimate cheap home-AI GPU?

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ4

14B 420 t/s

30B 270 t/s

70B 180 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 810 t/s on DiffusionGemma 26B-A4B-class models at Q4.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$3,999

tokens / secQ4

14B 70 t/s

30B 38 t/s

70B 18 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$3,999

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single RTX 3090 (used) build

NVIDIA · desktop tower

$1,750

tokens / secQ4

14B 50 t/s

30B 28 t/s

70B —

Memory24 GB · 23 usable

Bandwidth936 GB/s

Idle / Active22 W / 350 W

Sticker$1,750

Why: Best $/tg-per-second — ~$21 per t/s.

Amazon (used) ↗eBay (used) ↗Newegg ↗

📺 Reviews on YouTube

▶ Local AI server benchmark — 3090 vs dual 3060s

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-27B benchmark

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ4

14B 390 t/s

30B 250 t/s

70B 170 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ4

14B 340 t/s

30B 250 t/s

70B 170 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

Mac Mini M4 (24 GB)

Apple · mini desktop

$999

tokens / secQ4

14B 12 t/s

30B —

70B —

Memory24 GB · 18 usable

Bandwidth120 GB/s

Idle / Active4 W / 50 W

Sticker$999

Why: 50 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ New Mac Mini M4 running SD1.5, FLUX, and Ollama (Qwen)

Cheapest

Mac Mini M4 (24 GB)

Apple · mini desktop

$999

tokens / secQ4

14B 12 t/s

30B —

70B —

Memory24 GB · 18 usable

Bandwidth120 GB/s

Idle / Active4 W / 50 W

Sticker$999

Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($999 USD).

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ New Mac Mini M4 running SD1.5, FLUX, and Ollama (Qwen)

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ4

14B 270 t/s

30B 160 t/s

70B —

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 480 t/s on DiffusionGemma 26B-A4B-class models at Q4.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$3,999

tokens / secQ4

14B 70 t/s

30B 38 t/s

70B 18 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$3,999

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single RTX 5090 build

NVIDIA · desktop tower

$4,900

tokens / secQ4

14B 124 t/s

30B 70 t/s

70B —

Memory32 GB · 31 usable

Bandwidth1792 GB/s

Idle / Active30 W / 520 W

Sticker$4,900

Why: Best $/tg-per-second — ~$23 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗

📺 Reviews on YouTube

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-27B benchmark

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-35B-A3B benchmark

▶ Not even close — LLMs on RTX 5090 vs others (AZisk)

Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server

$47,000

tokens / secQ4

14B 225 t/s

30B 135 t/s

70B 75 t/s

Memory180 GB · 176 usable

Bandwidth8000 GB/s

Idle / Active100 W / 1000 W

Sticker$47,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA B200 partners ↗SHI enterprise ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ AI Lab: NVIDIA B200 vs GB200 explained | GPU architecture for LLMs

Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ4

14B 70 t/s

30B 38 t/s

70B 18 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 480 GB usable — most headroom for batching and longer contexts.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Efficient

MacBook Pro M4 Pro 48 GB

Apple · laptop

$2,899

tokens / secQ4

14B 28 t/s

30B 14 t/s

70B 6.0 t/s

Memory48 GB · 40 usable

Bandwidth273 GB/s

Idle / Active5 W / 70 W

Sticker$2,899

Why: 70 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ FREE Local LLMs on Apple Silicon — Fast! (AZisk)

Every other build that runs DiffusionGemma 26B-A4B

54 additional builds fit DiffusionGemma 26B-A4B at Q4_K_M (18 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q4)	Active W	5-yr power
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower	$750	24 / 23 GB	347 GB/s	36 t/s	250 W	$854
Tesla V100 32 GB SXM2 mod buildNVIDIA · desktop tower	$900	32 / 31 GB	900 GB/s	33 t/s	300 W	$1.1k
Single Intel Arc Pro B70 buildIntel · desktop tower	$1.8k	32 / 31 GB	608 GB/s	75 t/s	220 W	$782
Mac Studio M4 Max 36 GBApple · small desktop	$2.0k	36 / 28 GB	546 GB/s	84 t/s	130 W	$453
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower	$2.0k	32 / 31 GB	640 GB/s	—	300 W	$1.1k
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower	$2.3k	128 / 122 GB	1024 GB/s	38 t/s	1200 W	$4.2k
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower	$2.7k	96 / 92 GB	347 GB/s	24 t/s	1000 W	$3.5k
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop	$2.8k	128 / 96 GB	256 GB/s	48 t/s	120 W	$420
Dual RTX 3090 (used) buildNVIDIA · desktop tower	$3.1k	48 / 46 GB	936 GB/s	105 t/s	700 W	$2.4k
MacBook Pro M5 Pro 48 GBApple · laptop	$3.2k	48 / 40 GB	307 GB/s	54 t/s	75 W	$263
Single RTX 4090 buildNVIDIA · desktop tower	$3.2k	24 / 23 GB	1008 GB/s	126 t/s	410 W	$1.4k
Dual Intel Arc Pro B70 buildIntel · desktop tower	$3.2k	64 / 62 GB	608 GB/s	105 t/s	380 W	$1.4k
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation	$3.5k	32 / 31 GB	576 GB/s	54 t/s	260 W	$920
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation	$3.7k	64 / 62 GB	640 GB/s	90 t/s	600 W	$2.1k
MacBook Pro M4 Max 64 GBApple · laptop	$4.0k	64 / 54 GB	410 GB/s	66 t/s	90 W	$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop	$4.1k	128 / 119 GB	273 GB/s	90 t/s	240 W	$887
MacBook Pro M5 Max 64 GBApple · laptop	$4.1k	64 / 54 GB	614 GB/s	90 t/s	95 W	$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop	$4.7k	128 / 119 GB	273 GB/s	90 t/s	240 W	$887
Mac Studio M4 Max 128 GBApple · small desktop	$4.7k	128 / 112 GB	546 GB/s	90 t/s	130 W	$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop	$4.7k	128 / 119 GB	273 GB/s	90 t/s	240 W	$887
ASUS Ascent GX10 (128 GB)ASUS · small desktop	$4.7k	128 / 119 GB	273 GB/s	84 t/s	240 W	$903
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation	$4.7k	48 / 46 GB	768 GB/s	72 t/s	300 W	$1.0k
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation	$5k	48 / 46 GB	864 GB/s	66 t/s	295 W	$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop	$5k	128 / 119 GB	273 GB/s	90 t/s	160 W	$650
MacBook Pro M5 Max 128 GBApple · laptop	$5k	128 / 108 GB	614 GB/s	90 t/s	95 W	$332
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair	$6k	256 / 192 GB	256 GB/s	72 t/s	240 W	$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower	$6k	128 / 124 GB	608 GB/s	114 t/s	700 W	$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower	$7k	96 / 92 GB	936 GB/s	120 t/s	1400 W	$4.9k
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation	$7k	48 / 46 GB	1344 GB/s	—	300 W	$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation	$8k	48 / 46 GB	960 GB/s	120 t/s	300 W	$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop	$8k	256 / 232 GB	819 GB/s	114 t/s	180 W	$624
Dual AMD Radeon Pro W7900 buildAMD · workstation	$9k	96 / 92 GB	864 GB/s	99 t/s	600 W	$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect	$10k	256 / 240 GB	273 GB/s	150 t/s	460 W	$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower	$10k	64 / 62 GB	1792 GB/s	270 t/s	1050 W	$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower	$11k	256 / 248 GB	608 GB/s	126 t/s	1450 W	$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric	$12k	512 / 384 GB	256 GB/s	96 t/s	480 W	$1.7k
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation	$12k	96 / 93 GB	1792 GB/s	255 t/s	600 W	$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal	$15k	144 / 138 GB	960 GB/s	165 t/s	1500 W	$5k
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	165 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	120 t/s	960 W	$3.4k
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation	$24k	192 / 188 GB	1792 GB/s	300 t/s	1100 W	$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)	$25k	256 / 250 GB	6000 GB/s	390 t/s	1000 W	$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal	$25k	144 / 138 GB	1008 GB/s	225 t/s	2200 W	$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	150 t/s	440 W	$1.5k
Single AMD Instinct MI300X 192 GB workstationAMD · workstation	$30k	192 / 188 GB	5300 GB/s	300 t/s	750 W	$2.8k
Single H100 80 GB workstationNVIDIA · workstation	$32k	80 / 78 GB	3350 GB/s	270 t/s	700 W	$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	450 t/s	2200 W	$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server	$40k	141 / 138 GB	4800 GB/s	375 t/s	700 W	$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal	$40k	192 / 184 GB	1008 GB/s	285 t/s	3200 W	$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	216 t/s	1840 W	$7k
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal	$45k	128 / 124 GB	1792 GB/s	330 t/s	2300 W	$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)	$78k	768 / 744 GB	1792 GB/s	660 t/s	4800 W	$16k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	540 t/s	5600 W	$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops	—	128 / 119 GB	300 GB/s	—	— W	$756

Cheapest

Single AMD Instinct MI50 32 GB (used) build

AMD · desktop tower

$700

tokens / secQ5

14B 32 t/s

30B —

70B —

Memory32 GB · 31 usable

Bandwidth1024 GB/s

Idle / Active18 W / 300 W

Sticker$700

Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($700 USD).

Amazon ↗eBay (used) ↗Alibaba ↗

📺 Reviews on YouTube

▶ AMD MI50 32 GB speed test — Ollama vs llama.cpp (GPT-OSS & Qwen3)

▶ Is the Radeon Instinct MI50 32 GB the ultimate cheap home-AI GPU?

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ5

14B 353 t/s

30B 227 t/s

70B 151 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 680 t/s on DiffusionGemma 26B-A4B-class models at Q5.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$3,999

tokens / secQ5

14B 59 t/s

30B 32 t/s

70B 15 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$3,999

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single RTX 3090 (used) build

NVIDIA · desktop tower

$1,750

tokens / secQ5

14B 42 t/s

30B 24 t/s

70B —

Memory24 GB · 23 usable

Bandwidth936 GB/s

Idle / Active22 W / 350 W

Sticker$1,750

Why: Best $/tg-per-second — ~$25 per t/s.

Amazon (used) ↗eBay (used) ↗Newegg ↗

📺 Reviews on YouTube

▶ Local AI server benchmark — 3090 vs dual 3060s

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-27B benchmark

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ5

14B 328 t/s

30B 210 t/s

70B 143 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ5

14B 286 t/s

30B 210 t/s

70B 143 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

MacBook Pro M4 Pro 48 GB

Apple · laptop

$2,899

tokens / secQ5

14B 24 t/s

30B 12 t/s

70B 5.0 t/s

Memory48 GB · 40 usable

Bandwidth273 GB/s

Idle / Active5 W / 70 W

Sticker$2,899

Why: 70 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ FREE Local LLMs on Apple Silicon — Fast! (AZisk)

Cheapest

Single Intel Arc Pro B70 build

Intel · desktop tower

$1,800

tokens / secQ5

14B 34 t/s

30B 21 t/s

70B —

Memory32 GB · 31 usable

Bandwidth608 GB/s

Idle / Active18 W / 220 W

Sticker$1,800

Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($1.8k USD).

Amazon ↗Newegg ↗B&H Photo ↗

📺 Reviews on YouTube

▶ Intel Arc Pro B70 Review: The 32 GB VRAM Beast

▶ Should you buy Intel Arc Pro B70 for Local AI? Gemma 4 benchmark

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ5

14B 227 t/s

30B 134 t/s

70B —

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 403 t/s on DiffusionGemma 26B-A4B-class models at Q5.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$3,999

tokens / secQ5

14B 59 t/s

30B 32 t/s

70B 15 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$3,999

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single RTX 5090 build

NVIDIA · desktop tower

$4,900

tokens / secQ5

14B 104 t/s

30B 59 t/s

70B —

Memory32 GB · 31 usable

Bandwidth1792 GB/s

Idle / Active30 W / 520 W

Sticker$4,900

Why: Best $/tg-per-second — ~$28 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗

📺 Reviews on YouTube

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-27B benchmark

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-35B-A3B benchmark

▶ Not even close — LLMs on RTX 5090 vs others (AZisk)

Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server

$47,000

tokens / secQ5

14B 189 t/s

30B 113 t/s

70B 63 t/s

Memory180 GB · 176 usable

Bandwidth8000 GB/s

Idle / Active100 W / 1000 W

Sticker$47,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA B200 partners ↗SHI enterprise ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ AI Lab: NVIDIA B200 vs GB200 explained | GPU architecture for LLMs

Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ5

14B 59 t/s

30B 32 t/s

70B 15 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 480 GB usable — most headroom for batching and longer contexts.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Efficient

MacBook Pro M4 Pro 48 GB

Apple · laptop

$2,899

tokens / secQ5

14B 24 t/s

30B 12 t/s

70B 5.0 t/s

Memory48 GB · 40 usable

Bandwidth273 GB/s

Idle / Active5 W / 70 W

Sticker$2,899

Why: 70 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ FREE Local LLMs on Apple Silicon — Fast! (AZisk)

Every other build that runs DiffusionGemma 26B-A4B

53 additional builds fit DiffusionGemma 26B-A4B at Q5_K_M (21 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q5)	Active W	5-yr power
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower	$750	24 / 23 GB	347 GB/s	30 t/s	250 W	$854
Tesla V100 32 GB SXM2 mod buildNVIDIA · desktop tower	$900	32 / 31 GB	900 GB/s	28 t/s	300 W	$1.1k
Mac Studio M4 Max 36 GBApple · small desktop	$2.0k	36 / 28 GB	546 GB/s	71 t/s	130 W	$453
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower	$2.0k	32 / 31 GB	640 GB/s	—	300 W	$1.1k
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower	$2.3k	128 / 122 GB	1024 GB/s	32 t/s	1200 W	$4.2k
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower	$2.7k	96 / 92 GB	347 GB/s	20 t/s	1000 W	$3.5k
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop	$2.8k	128 / 96 GB	256 GB/s	40 t/s	120 W	$420
Dual RTX 3090 (used) buildNVIDIA · desktop tower	$3.1k	48 / 46 GB	936 GB/s	88 t/s	700 W	$2.4k
MacBook Pro M5 Pro 48 GBApple · laptop	$3.2k	48 / 40 GB	307 GB/s	45 t/s	75 W	$263
Single RTX 4090 buildNVIDIA · desktop tower	$3.2k	24 / 23 GB	1008 GB/s	106 t/s	410 W	$1.4k
Dual Intel Arc Pro B70 buildIntel · desktop tower	$3.2k	64 / 62 GB	608 GB/s	88 t/s	380 W	$1.4k
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation	$3.5k	32 / 31 GB	576 GB/s	45 t/s	260 W	$920
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation	$3.7k	64 / 62 GB	640 GB/s	76 t/s	600 W	$2.1k
MacBook Pro M4 Max 64 GBApple · laptop	$4.0k	64 / 54 GB	410 GB/s	55 t/s	90 W	$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop	$4.1k	128 / 119 GB	273 GB/s	76 t/s	240 W	$887
MacBook Pro M5 Max 64 GBApple · laptop	$4.1k	64 / 54 GB	614 GB/s	76 t/s	95 W	$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop	$4.7k	128 / 119 GB	273 GB/s	76 t/s	240 W	$887
Mac Studio M4 Max 128 GBApple · small desktop	$4.7k	128 / 112 GB	546 GB/s	76 t/s	130 W	$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop	$4.7k	128 / 119 GB	273 GB/s	76 t/s	240 W	$887
ASUS Ascent GX10 (128 GB)ASUS · small desktop	$4.7k	128 / 119 GB	273 GB/s	71 t/s	240 W	$903
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation	$4.7k	48 / 46 GB	768 GB/s	60 t/s	300 W	$1.0k
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation	$5k	48 / 46 GB	864 GB/s	55 t/s	295 W	$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop	$5k	128 / 119 GB	273 GB/s	76 t/s	160 W	$650
MacBook Pro M5 Max 128 GBApple · laptop	$5k	128 / 108 GB	614 GB/s	76 t/s	95 W	$332
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair	$6k	256 / 192 GB	256 GB/s	60 t/s	240 W	$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower	$6k	128 / 124 GB	608 GB/s	96 t/s	700 W	$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower	$7k	96 / 92 GB	936 GB/s	101 t/s	1400 W	$4.9k
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation	$7k	48 / 46 GB	1344 GB/s	—	300 W	$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation	$8k	48 / 46 GB	960 GB/s	101 t/s	300 W	$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop	$8k	256 / 232 GB	819 GB/s	96 t/s	180 W	$624
Dual AMD Radeon Pro W7900 buildAMD · workstation	$9k	96 / 92 GB	864 GB/s	83 t/s	600 W	$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect	$10k	256 / 240 GB	273 GB/s	126 t/s	460 W	$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower	$10k	64 / 62 GB	1792 GB/s	227 t/s	1050 W	$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower	$11k	256 / 248 GB	608 GB/s	106 t/s	1450 W	$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric	$12k	512 / 384 GB	256 GB/s	81 t/s	480 W	$1.7k
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation	$12k	96 / 93 GB	1792 GB/s	214 t/s	600 W	$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal	$15k	144 / 138 GB	960 GB/s	139 t/s	1500 W	$5k
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	139 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	101 t/s	960 W	$3.4k
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation	$24k	192 / 188 GB	1792 GB/s	252 t/s	1100 W	$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)	$25k	256 / 250 GB	6000 GB/s	328 t/s	1000 W	$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal	$25k	144 / 138 GB	1008 GB/s	189 t/s	2200 W	$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	126 t/s	440 W	$1.5k
Single AMD Instinct MI300X 192 GB workstationAMD · workstation	$30k	192 / 188 GB	5300 GB/s	252 t/s	750 W	$2.8k
Single H100 80 GB workstationNVIDIA · workstation	$32k	80 / 78 GB	3350 GB/s	227 t/s	700 W	$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	378 t/s	2200 W	$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server	$40k	141 / 138 GB	4800 GB/s	315 t/s	700 W	$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal	$40k	192 / 184 GB	1008 GB/s	239 t/s	3200 W	$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	181 t/s	1840 W	$7k
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal	$45k	128 / 124 GB	1792 GB/s	277 t/s	2300 W	$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)	$78k	768 / 744 GB	1792 GB/s	554 t/s	4800 W	$16k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	454 t/s	5600 W	$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops	—	128 / 119 GB	300 GB/s	—	— W	$756

Cheapest

Quad AMD MI50 32 GB (128 GB) homelab build

AMD · rack/large tower

$2,300

tokens / secQ8

14B 26 t/s

30B 26 t/s

70B 18 t/s

Memory128 GB · 122 usable

Bandwidth1024 GB/s

Idle / Active75 W / 1200 W

Sticker$2,300

Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($2.3k USD).

Amazon ↗eBay (build 4× used) ↗Alibaba ↗

📺 Reviews on YouTube

▶ AMD MI50 32 GB for local AI — Qwen 3.6 & Gemma 4 on llama.cpp / vLLM

▶ Scalable local AI build — Qwen3.6-27B, Gemma4-31B, Qwen3.5-122B

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ8

14B 218 t/s

30B 140 t/s

70B 94 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 421 t/s on DiffusionGemma 26B-A4B-class models at Q8.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$3,999

tokens / secQ8

14B 36 t/s

30B 20 t/s

70B 9.4 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$3,999

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Dual RTX 3090 (used) build

NVIDIA · desktop tower

$3,100

tokens / secQ8

14B 29 t/s

30B 18 t/s

70B 11 t/s

Memory48 GB · 46 usable

Bandwidth936 GB/s

Idle / Active45 W / 700 W

Sticker$3,100

Why: Best $/tg-per-second — ~$57 per t/s.

Amazon (used) ↗eBay (build 2× used) ↗

📺 Reviews on YouTube

▶ ULTIMATE Local AI Quad 3090 Build (covers dual + quad)

▶ RTX 3090 vs 4090 vs 5090 vs Mac M5 Max — Qwen3.6-27B benchmark

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ8

14B 203 t/s

30B 130 t/s

70B 88 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ8

14B 177 t/s

30B 130 t/s

70B 88 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

MacBook Pro M4 Pro 48 GB

Apple · laptop

$2,899

tokens / secQ8

14B 15 t/s

30B 7.3 t/s

70B 3.1 t/s

Memory48 GB · 40 usable

Bandwidth273 GB/s

Idle / Active5 W / 70 W

Sticker$2,899

Why: 70 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ FREE Local LLMs on Apple Silicon — Fast! (AZisk)

Cheapest

AMD Ryzen AI Max+ 395 (128 GB)

AMD · mini desktop / laptop

$2,799

tokens / secQ8

14B 15 t/s

30B 8.3 t/s

70B 2.6 t/s

Memory128 GB · 96 usable

Bandwidth256 GB/s

Idle / Active8 W / 120 W

Sticker$2,799

Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($2.8k USD).

Amazon ↗GMKtec EVO-X2 ↗Framework ↗

📺 Reviews on YouTube

▶ AMD Strix Halo / Ryzen AI Max+ 395 — an honest review

▶ Tuning the 128 GB AMD AI mini PC for fast inference

▶ Running vLLM on Strix Halo + ROCm performance updates

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ8

14B 140 t/s

30B 83 t/s

70B —

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 250 t/s on DiffusionGemma 26B-A4B-class models at Q8.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$3,999

tokens / secQ8

14B 36 t/s

30B 20 t/s

70B 9.4 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$3,999

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Dual Intel Arc Pro B70 build

Intel · desktop tower

$3,200

tokens / secQ8

14B 29 t/s

30B 18 t/s

70B 7.3 t/s

Memory64 GB · 62 usable

Bandwidth608 GB/s

Idle / Active35 W / 380 W

Sticker$3,200

Why: Best $/tg-per-second — ~$59 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗

📺 Reviews on YouTube

▶ Should you buy Intel Arc Pro B70 for Local AI? (covers scaling)

▶ Arc Pro B70 — Did Intel finally get it right? (Country Boy Computers)

▶ Level1 Linux Weekly — How about that Intel Arc B70?

Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server

$47,000

tokens / secQ8

14B 117 t/s

30B 70 t/s

70B 39 t/s

Memory180 GB · 176 usable

Bandwidth8000 GB/s

Idle / Active100 W / 1000 W

Sticker$47,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA B200 partners ↗SHI enterprise ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ AI Lab: NVIDIA B200 vs GB200 explained | GPU architecture for LLMs

Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ8

14B 36 t/s

30B 20 t/s

70B 9.4 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 480 GB usable — most headroom for batching and longer contexts.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Efficient

MacBook Pro M4 Pro 48 GB

Apple · laptop

$2,899

tokens / secQ8

14B 15 t/s

30B 7.3 t/s

70B 3.1 t/s

Memory48 GB · 40 usable

Bandwidth273 GB/s

Idle / Active5 W / 70 W

Sticker$2,899

Why: 70 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ FREE Local LLMs on Apple Silicon — Fast! (AZisk)

Every other build that runs DiffusionGemma 26B-A4B

43 additional builds fit DiffusionGemma 26B-A4B at Q8_0 (35 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q8)	Active W	5-yr power
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower	$2.7k	96 / 92 GB	347 GB/s	12 t/s	1000 W	$3.5k
MacBook Pro M5 Pro 48 GBApple · laptop	$3.2k	48 / 40 GB	307 GB/s	28 t/s	75 W	$263
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation	$3.7k	64 / 62 GB	640 GB/s	47 t/s	600 W	$2.1k
MacBook Pro M4 Max 64 GBApple · laptop	$4.0k	64 / 54 GB	410 GB/s	34 t/s	90 W	$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop	$4.1k	128 / 119 GB	273 GB/s	47 t/s	240 W	$887
MacBook Pro M5 Max 64 GBApple · laptop	$4.1k	64 / 54 GB	614 GB/s	47 t/s	95 W	$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop	$4.7k	128 / 119 GB	273 GB/s	47 t/s	240 W	$887
Mac Studio M4 Max 128 GBApple · small desktop	$4.7k	128 / 112 GB	546 GB/s	47 t/s	130 W	$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop	$4.7k	128 / 119 GB	273 GB/s	47 t/s	240 W	$887
ASUS Ascent GX10 (128 GB)ASUS · small desktop	$4.7k	128 / 119 GB	273 GB/s	44 t/s	240 W	$903
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation	$4.7k	48 / 46 GB	768 GB/s	37 t/s	300 W	$1.0k
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation	$5k	48 / 46 GB	864 GB/s	34 t/s	295 W	$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop	$5k	128 / 119 GB	273 GB/s	47 t/s	160 W	$650
MacBook Pro M5 Max 128 GBApple · laptop	$5k	128 / 108 GB	614 GB/s	47 t/s	95 W	$332
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair	$6k	256 / 192 GB	256 GB/s	37 t/s	240 W	$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower	$6k	128 / 124 GB	608 GB/s	59 t/s	700 W	$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower	$7k	96 / 92 GB	936 GB/s	62 t/s	1400 W	$4.9k
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation	$7k	48 / 46 GB	1344 GB/s	—	300 W	$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation	$8k	48 / 46 GB	960 GB/s	62 t/s	300 W	$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop	$8k	256 / 232 GB	819 GB/s	59 t/s	180 W	$624
Dual AMD Radeon Pro W7900 buildAMD · workstation	$9k	96 / 92 GB	864 GB/s	51 t/s	600 W	$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect	$10k	256 / 240 GB	273 GB/s	78 t/s	460 W	$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower	$10k	64 / 62 GB	1792 GB/s	140 t/s	1050 W	$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower	$11k	256 / 248 GB	608 GB/s	66 t/s	1450 W	$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric	$12k	512 / 384 GB	256 GB/s	50 t/s	480 W	$1.7k
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation	$12k	96 / 93 GB	1792 GB/s	133 t/s	600 W	$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal	$15k	144 / 138 GB	960 GB/s	86 t/s	1500 W	$5k
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	86 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	62 t/s	960 W	$3.4k
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation	$24k	192 / 188 GB	1792 GB/s	156 t/s	1100 W	$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)	$25k	256 / 250 GB	6000 GB/s	203 t/s	1000 W	$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal	$25k	144 / 138 GB	1008 GB/s	117 t/s	2200 W	$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	78 t/s	440 W	$1.5k
Single AMD Instinct MI300X 192 GB workstationAMD · workstation	$30k	192 / 188 GB	5300 GB/s	156 t/s	750 W	$2.8k
Single H100 80 GB workstationNVIDIA · workstation	$32k	80 / 78 GB	3350 GB/s	140 t/s	700 W	$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	234 t/s	2200 W	$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server	$40k	141 / 138 GB	4800 GB/s	195 t/s	700 W	$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal	$40k	192 / 184 GB	1008 GB/s	148 t/s	3200 W	$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	112 t/s	1840 W	$7k
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal	$45k	128 / 124 GB	1792 GB/s	172 t/s	2300 W	$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)	$78k	768 / 744 GB	1792 GB/s	343 t/s	4800 W	$16k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	281 t/s	5600 W	$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops	—	128 / 119 GB	300 GB/s	—	— W	$756

Open in the live picker (Q2 / Q5 / Q8 toggles) → Compare DiffusionGemma 26B-A4B against other LLMs → Pick LLMs for your hardware → Submit a benchmark for DiffusionGemma 26B-A4B ↗

Sources

Last updated 2026-06-13