Hardware to run DiffusionGemma 26B-A4B
Jun 10 2026. Google's experimental text-diffusion Gemma — 25.2 B MoE (3.8 B active) that denoises 256-token blocks in parallel: 1,000+ tok/s on H100, 700+ on a 5090, and it runs on 18 GB cards quantized. Trades benchmark quality for ~4× generation speed vs Gemma 4 26B-A4B.
Quantization
Availability
Cheapest
Single Tesla P100 16 GB (used) build
NVIDIA · desktop tower
$600
tokens / secQ2
14B 23 t/s
30B —
70B —
Memory16 GB · 15 usable
Bandwidth732 GB/s
Idle / Active25 W / 250 W
Sticker$600
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($600 USD).
📺 Reviews on YouTube
Fastest
DGX B200 — 8× B200 server (1.44 TB HBM3e)
NVIDIA · 10U DGX server
$475,000
tokens / secQ2
14B 504 t/s
30B 324 t/s
70B 216 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Highest measured tg/s — 972 t/s on DiffusionGemma 26B-A4B-class models at Q2.
📺 Reviews on YouTube
All-rounder
Mac Studio M3 Ultra 96 GB
Apple · small desktop
$3,999
tokens / secQ2
14B 84 t/s
30B 46 t/s
70B 22 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
Single RTX 3090 (used) build
NVIDIA · desktop tower
$1,750
tokens / secQ2
14B 60 t/s
30B 34 t/s
70B —
Memory24 GB · 23 usable
Bandwidth936 GB/s
Idle / Active22 W / 350 W
Sticker$1,750
Why: Best $/tg-per-second — ~$17 per t/s.
Best CUDA
DGX H200 — 8× H200 server (1.13 TB HBM3e)
NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ2
14B 468 t/s
30B 300 t/s
70B 204 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
12× RTX Pro 6000 Blackwell rack (1152 GB)
NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ2
14B 408 t/s
30B 300 t/s
70B 204 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: 1116 GB usable — most headroom for batching and longer contexts.
Efficient
MacBook Air M4 (16 GB)
Apple · laptop
$1,099
tokens / secQ2
14B 9.6 t/s
30B —
70B —
Memory16 GB · 11 usable
Bandwidth120 GB/s
Idle / Active5 W / 30 W
Sticker$1,099
Why: 30 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Cheapest
Mac Mini M4 (16 GB)
Apple · mini desktop
$799
tokens / secQ2
14B 12 t/s
30B —
70B —
Memory16 GB · 11 usable
Bandwidth120 GB/s
Idle / Active4 W / 50 W
Sticker$799
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($799 USD).
Fastest
Single AMD Instinct MI355X 288 GB workstation
AMD · 4U server (OAM, liquid-cooled)
$28,000
tokens / secQ2
14B 324 t/s
30B 192 t/s
70B —
Memory288 GB · 282 usable
Bandwidth8000 GB/s
Idle / Active140 W / 1400 W
Sticker$28,000
Why: Highest measured tg/s — 576 t/s on DiffusionGemma 26B-A4B-class models at Q2.
All-rounder
Mac Studio M3 Ultra 96 GB
Apple · small desktop
$3,999
tokens / secQ2
14B 84 t/s
30B 46 t/s
70B 22 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
Single RTX 5090 build
NVIDIA · desktop tower
$4,900
tokens / secQ2
14B 149 t/s
30B 84 t/s
70B —
Memory32 GB · 31 usable
Bandwidth1792 GB/s
Idle / Active30 W / 520 W
Sticker$4,900
Why: Best $/tg-per-second — ~$19 per t/s.
Best CUDA
Single B200 180 GB workstation
NVIDIA · workstation / 4U server
$47,000
tokens / secQ2
14B 270 t/s
30B 162 t/s
70B 90 t/s
Memory180 GB · 176 usable
Bandwidth8000 GB/s
Idle / Active100 W / 1000 W
Sticker$47,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
Mac Studio M3 Ultra 512 GB
Apple · small desktop
$14,199
tokens / secQ2
14B 84 t/s
30B 46 t/s
70B 22 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: 480 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient
MacBook Air M4 (16 GB)
Apple · laptop
$1,099
tokens / secQ2
14B 9.6 t/s
30B —
70B —
Memory16 GB · 11 usable
Bandwidth120 GB/s
Idle / Active5 W / 30 W
Sticker$1,099
Why: 30 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Every other build that runs DiffusionGemma 26B-A4B
61 additional builds fit DiffusionGemma 26B-A4B at Q2_K (10 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q2) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
Single AMD Instinct MI50 32 GB (used) buildAMD · desktop tower | $700 | 32 / 31 GB | 1024 GB/s | — | 300 W | $1.0k |
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower | $750 | 24 / 23 GB | 347 GB/s | 43 t/s | 250 W | $854 |
RTX 3060 12 GB buildNVIDIA · desktop tower | $900 | 12 / 11 GB | 360 GB/s | — | 170 W | $598 |
Tesla V100 32 GB SXM2 mod buildNVIDIA · desktop tower | $900 | 32 / 31 GB | 900 GB/s | 40 t/s | 300 W | $1.1k |
Mac Mini M4 (24 GB)Apple · mini desktop | $999 | 24 / 18 GB | 120 GB/s | — | 50 W | $177 |
MacBook Air M5 (16 GB)Apple · laptop | $1.1k | 16 / 11 GB | 153 GB/s | — | 30 W | $115 |
Single Intel Arc B580 12 GB buildIntel · desktop tower | $1.1k | 12 / 11 GB | 456 GB/s | — | 190 W | $664 |
Single AMD Radeon RX 9070 XT 16 GB buildAMD · desktop tower | $1.3k | 16 / 15 GB | 645 GB/s | — | 304 W | $1.1k |
Single Intel Arc Pro B70 buildIntel · desktop tower | $1.8k | 32 / 31 GB | 608 GB/s | 90 t/s | 220 W | $782 |
Mac Studio M4 Max 36 GBApple · small desktop | $2.0k | 36 / 28 GB | 546 GB/s | 101 t/s | 130 W | $453 |
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower | $2.0k | 32 / 31 GB | 640 GB/s | — | 300 W | $1.1k |
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower | $2.3k | 128 / 122 GB | 1024 GB/s | 45 t/s | 1200 W | $4.2k |
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower | $2.7k | 96 / 92 GB | 347 GB/s | 29 t/s | 1000 W | $3.5k |
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop | $2.8k | 128 / 96 GB | 256 GB/s | 58 t/s | 120 W | $420 |
MacBook Pro M4 Pro 48 GBApple · laptop | $2.9k | 48 / 40 GB | 273 GB/s | 50 t/s | 70 W | $246 |
Dual RTX 3090 (used) buildNVIDIA · desktop tower | $3.1k | 48 / 46 GB | 936 GB/s | 126 t/s | 700 W | $2.4k |
MacBook Pro M5 Pro 48 GBApple · laptop | $3.2k | 48 / 40 GB | 307 GB/s | 65 t/s | 75 W | $263 |
Single RTX 4090 buildNVIDIA · desktop tower | $3.2k | 24 / 23 GB | 1008 GB/s | 151 t/s | 410 W | $1.4k |
Dual Intel Arc Pro B70 buildIntel · desktop tower | $3.2k | 64 / 62 GB | 608 GB/s | 126 t/s | 380 W | $1.4k |
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation | $3.5k | 32 / 31 GB | 576 GB/s | 65 t/s | 260 W | $920 |
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation | $3.7k | 64 / 62 GB | 640 GB/s | 108 t/s | 600 W | $2.1k |
MacBook Pro M4 Max 64 GBApple · laptop | $4.0k | 64 / 54 GB | 410 GB/s | 79 t/s | 90 W | $315 |
Dell Pro Max with GB10 (128 GB)Dell · small desktop | $4.1k | 128 / 119 GB | 273 GB/s | 108 t/s | 240 W | $887 |
MacBook Pro M5 Max 64 GBApple · laptop | $4.1k | 64 / 54 GB | 614 GB/s | 108 t/s | 95 W | $332 |
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 108 t/s | 240 W | $887 |
Mac Studio M4 Max 128 GBApple · small desktop | $4.7k | 128 / 112 GB | 546 GB/s | 108 t/s | 130 W | $453 |
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 108 t/s | 240 W | $887 |
ASUS Ascent GX10 (128 GB)ASUS · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 101 t/s | 240 W | $903 |
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation | $4.7k | 48 / 46 GB | 768 GB/s | 86 t/s | 300 W | $1.0k |
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation | $5k | 48 / 46 GB | 864 GB/s | 79 t/s | 295 W | $1.0k |
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop | $5k | 128 / 119 GB | 273 GB/s | 108 t/s | 160 W | $650 |
MacBook Pro M5 Max 128 GBApple · laptop | $5k | 128 / 108 GB | 614 GB/s | 108 t/s | 95 W | $332 |
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair | $6k | 256 / 192 GB | 256 GB/s | 86 t/s | 240 W | $841 |
Quad Intel Arc Pro B70 buildIntel · rack/large tower | $6k | 128 / 124 GB | 608 GB/s | 137 t/s | 700 W | $2.5k |
Quad RTX 3090 (used) buildNVIDIA · rack/large tower | $7k | 96 / 92 GB | 936 GB/s | 144 t/s | 1400 W | $4.9k |
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation | $7k | 48 / 46 GB | 1344 GB/s | — | 300 W | $1.1k |
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation | $8k | 48 / 46 GB | 960 GB/s | 144 t/s | 300 W | $1.0k |
Mac Studio M3 Ultra 256 GBApple · small desktop | $8k | 256 / 232 GB | 819 GB/s | 137 t/s | 180 W | $624 |
Dual AMD Radeon Pro W7900 buildAMD · workstation | $9k | 96 / 92 GB | 864 GB/s | 119 t/s | 600 W | $2.1k |
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect | $10k | 256 / 240 GB | 273 GB/s | 180 t/s | 460 W | $1.7k |
Dual RTX 5090 buildNVIDIA · rack/large tower | $10k | 64 / 62 GB | 1792 GB/s | 324 t/s | 1050 W | $3.6k |
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower | $11k | 256 / 248 GB | 608 GB/s | 151 t/s | 1450 W | $5k |
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric | $12k | 512 / 384 GB | 256 GB/s | 115 t/s | 480 W | $1.7k |
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation | $12k | 96 / 93 GB | 1792 GB/s | 306 t/s | 600 W | $2.1k |
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal | $15k | 144 / 138 GB | 960 GB/s | 198 t/s | 1500 W | $5k |
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops | $20k | 512 / 488 GB | 273 GB/s | 198 t/s | 920 W | $3.4k |
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric | $23k | 1024 / 768 GB | 256 GB/s | 144 t/s | 960 W | $3.4k |
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation | $24k | 192 / 188 GB | 1792 GB/s | 360 t/s | 1100 W | $3.8k |
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM) | $25k | 256 / 250 GB | 6000 GB/s | 468 t/s | 1000 W | $3.6k |
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal | $25k | 144 / 138 GB | 1008 GB/s | 270 t/s | 2200 W | $8k |
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA | $28k | 1024 / 960 GB | 819 GB/s | 180 t/s | 440 W | $1.5k |
Single AMD Instinct MI300X 192 GB workstationAMD · workstation | $30k | 192 / 188 GB | 5300 GB/s | 360 t/s | 750 W | $2.8k |
Single H100 80 GB workstationNVIDIA · workstation | $32k | 80 / 78 GB | 3350 GB/s | 324 t/s | 700 W | $2.5k |
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal | $38k | 384 / 372 GB | 1792 GB/s | 540 t/s | 2200 W | $8k |
Single H200 141 GB workstationNVIDIA · workstation / 2U server | $40k | 141 / 138 GB | 4800 GB/s | 450 t/s | 700 W | $2.5k |
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal | $40k | 192 / 184 GB | 1008 GB/s | 342 t/s | 3200 W | $11k |
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric | $44k | 1024 / 976 GB | 273 GB/s | 259 t/s | 1840 W | $7k |
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal | $45k | 128 / 124 GB | 1792 GB/s | 396 t/s | 2300 W | $8k |
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT) | $78k | 768 / 744 GB | 1792 GB/s | 792 t/s | 4800 W | $16k |
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 648 t/s | 5600 W | $20k |
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops | — | 128 / 119 GB | 300 GB/s | — | — W | $756 |
Cheapest
Single AMD Instinct MI50 32 GB (used) build
AMD · desktop tower
$700
tokens / secQ4
14B 38 t/s
30B —
70B —
Memory32 GB · 31 usable
Bandwidth1024 GB/s
Idle / Active18 W / 300 W
Sticker$700
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($700 USD).
Fastest
DGX B200 — 8× B200 server (1.44 TB HBM3e)
NVIDIA · 10U DGX server
$475,000
tokens / secQ4
14B 420 t/s
30B 270 t/s
70B 180 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Highest measured tg/s — 810 t/s on DiffusionGemma 26B-A4B-class models at Q4.
📺 Reviews on YouTube
All-rounder
Mac Studio M3 Ultra 96 GB
Apple · small desktop
$3,999
tokens / secQ4
14B 70 t/s
30B 38 t/s
70B 18 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
Single RTX 3090 (used) build
NVIDIA · desktop tower
$1,750
tokens / secQ4
14B 50 t/s
30B 28 t/s
70B —
Memory24 GB · 23 usable
Bandwidth936 GB/s
Idle / Active22 W / 350 W
Sticker$1,750
Why: Best $/tg-per-second — ~$21 per t/s.
Best CUDA
DGX H200 — 8× H200 server (1.13 TB HBM3e)
NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ4
14B 390 t/s
30B 250 t/s
70B 170 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
12× RTX Pro 6000 Blackwell rack (1152 GB)
NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ4
14B 340 t/s
30B 250 t/s
70B 170 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: 1116 GB usable — most headroom for batching and longer contexts.
Efficient
Mac Mini M4 (24 GB)
Apple · mini desktop
$999
tokens / secQ4
14B 12 t/s
30B —
70B —
Memory24 GB · 18 usable
Bandwidth120 GB/s
Idle / Active4 W / 50 W
Sticker$999
Why: 50 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Cheapest
Mac Mini M4 (24 GB)
Apple · mini desktop
$999
tokens / secQ4
14B 12 t/s
30B —
70B —
Memory24 GB · 18 usable
Bandwidth120 GB/s
Idle / Active4 W / 50 W
Sticker$999
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($999 USD).
📺 Reviews on YouTube
Fastest
Single AMD Instinct MI355X 288 GB workstation
AMD · 4U server (OAM, liquid-cooled)
$28,000
tokens / secQ4
14B 270 t/s
30B 160 t/s
70B —
Memory288 GB · 282 usable
Bandwidth8000 GB/s
Idle / Active140 W / 1400 W
Sticker$28,000
Why: Highest measured tg/s — 480 t/s on DiffusionGemma 26B-A4B-class models at Q4.
All-rounder
Mac Studio M3 Ultra 96 GB
Apple · small desktop
$3,999
tokens / secQ4
14B 70 t/s
30B 38 t/s
70B 18 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
Single RTX 5090 build
NVIDIA · desktop tower
$4,900
tokens / secQ4
14B 124 t/s
30B 70 t/s
70B —
Memory32 GB · 31 usable
Bandwidth1792 GB/s
Idle / Active30 W / 520 W
Sticker$4,900
Why: Best $/tg-per-second — ~$23 per t/s.
Best CUDA
Single B200 180 GB workstation
NVIDIA · workstation / 4U server
$47,000
tokens / secQ4
14B 225 t/s
30B 135 t/s
70B 75 t/s
Memory180 GB · 176 usable
Bandwidth8000 GB/s
Idle / Active100 W / 1000 W
Sticker$47,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
Mac Studio M3 Ultra 512 GB
Apple · small desktop
$14,199
tokens / secQ4
14B 70 t/s
30B 38 t/s
70B 18 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: 480 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient
MacBook Pro M4 Pro 48 GB
Apple · laptop
$2,899
tokens / secQ4
14B 28 t/s
30B 14 t/s
70B 6.0 t/s
Memory48 GB · 40 usable
Bandwidth273 GB/s
Idle / Active5 W / 70 W
Sticker$2,899
Why: 70 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Every other build that runs DiffusionGemma 26B-A4B
54 additional builds fit DiffusionGemma 26B-A4B at Q4_K_M (18 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q4) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower | $750 | 24 / 23 GB | 347 GB/s | 36 t/s | 250 W | $854 |
Tesla V100 32 GB SXM2 mod buildNVIDIA · desktop tower | $900 | 32 / 31 GB | 900 GB/s | 33 t/s | 300 W | $1.1k |
Single Intel Arc Pro B70 buildIntel · desktop tower | $1.8k | 32 / 31 GB | 608 GB/s | 75 t/s | 220 W | $782 |
Mac Studio M4 Max 36 GBApple · small desktop | $2.0k | 36 / 28 GB | 546 GB/s | 84 t/s | 130 W | $453 |
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower | $2.0k | 32 / 31 GB | 640 GB/s | — | 300 W | $1.1k |
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower | $2.3k | 128 / 122 GB | 1024 GB/s | 38 t/s | 1200 W | $4.2k |
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower | $2.7k | 96 / 92 GB | 347 GB/s | 24 t/s | 1000 W | $3.5k |
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop | $2.8k | 128 / 96 GB | 256 GB/s | 48 t/s | 120 W | $420 |
Dual RTX 3090 (used) buildNVIDIA · desktop tower | $3.1k | 48 / 46 GB | 936 GB/s | 105 t/s | 700 W | $2.4k |
MacBook Pro M5 Pro 48 GBApple · laptop | $3.2k | 48 / 40 GB | 307 GB/s | 54 t/s | 75 W | $263 |
Single RTX 4090 buildNVIDIA · desktop tower | $3.2k | 24 / 23 GB | 1008 GB/s | 126 t/s | 410 W | $1.4k |
Dual Intel Arc Pro B70 buildIntel · desktop tower | $3.2k | 64 / 62 GB | 608 GB/s | 105 t/s | 380 W | $1.4k |
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation | $3.5k | 32 / 31 GB | 576 GB/s | 54 t/s | 260 W | $920 |
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation | $3.7k | 64 / 62 GB | 640 GB/s | 90 t/s | 600 W | $2.1k |
MacBook Pro M4 Max 64 GBApple · laptop | $4.0k | 64 / 54 GB | 410 GB/s | 66 t/s | 90 W | $315 |
Dell Pro Max with GB10 (128 GB)Dell · small desktop | $4.1k | 128 / 119 GB | 273 GB/s | 90 t/s | 240 W | $887 |
MacBook Pro M5 Max 64 GBApple · laptop | $4.1k | 64 / 54 GB | 614 GB/s | 90 t/s | 95 W | $332 |
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 90 t/s | 240 W | $887 |
Mac Studio M4 Max 128 GBApple · small desktop | $4.7k | 128 / 112 GB | 546 GB/s | 90 t/s | 130 W | $453 |
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 90 t/s | 240 W | $887 |
ASUS Ascent GX10 (128 GB)ASUS · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 84 t/s | 240 W | $903 |
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation | $4.7k | 48 / 46 GB | 768 GB/s | 72 t/s | 300 W | $1.0k |
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation | $5k | 48 / 46 GB | 864 GB/s | 66 t/s | 295 W | $1.0k |
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop | $5k | 128 / 119 GB | 273 GB/s | 90 t/s | 160 W | $650 |
MacBook Pro M5 Max 128 GBApple · laptop | $5k | 128 / 108 GB | 614 GB/s | 90 t/s | 95 W | $332 |
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair | $6k | 256 / 192 GB | 256 GB/s | 72 t/s | 240 W | $841 |
Quad Intel Arc Pro B70 buildIntel · rack/large tower | $6k | 128 / 124 GB | 608 GB/s | 114 t/s | 700 W | $2.5k |
Quad RTX 3090 (used) buildNVIDIA · rack/large tower | $7k | 96 / 92 GB | 936 GB/s | 120 t/s | 1400 W | $4.9k |
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation | $7k | 48 / 46 GB | 1344 GB/s | — | 300 W | $1.1k |
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation | $8k | 48 / 46 GB | 960 GB/s | 120 t/s | 300 W | $1.0k |
Mac Studio M3 Ultra 256 GBApple · small desktop | $8k | 256 / 232 GB | 819 GB/s | 114 t/s | 180 W | $624 |
Dual AMD Radeon Pro W7900 buildAMD · workstation | $9k | 96 / 92 GB | 864 GB/s | 99 t/s | 600 W | $2.1k |
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect | $10k | 256 / 240 GB | 273 GB/s | 150 t/s | 460 W | $1.7k |
Dual RTX 5090 buildNVIDIA · rack/large tower | $10k | 64 / 62 GB | 1792 GB/s | 270 t/s | 1050 W | $3.6k |
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower | $11k | 256 / 248 GB | 608 GB/s | 126 t/s | 1450 W | $5k |
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric | $12k | 512 / 384 GB | 256 GB/s | 96 t/s | 480 W | $1.7k |
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation | $12k | 96 / 93 GB | 1792 GB/s | 255 t/s | 600 W | $2.1k |
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal | $15k | 144 / 138 GB | 960 GB/s | 165 t/s | 1500 W | $5k |
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops | $20k | 512 / 488 GB | 273 GB/s | 165 t/s | 920 W | $3.4k |
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric | $23k | 1024 / 768 GB | 256 GB/s | 120 t/s | 960 W | $3.4k |
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation | $24k | 192 / 188 GB | 1792 GB/s | 300 t/s | 1100 W | $3.8k |
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM) | $25k | 256 / 250 GB | 6000 GB/s | 390 t/s | 1000 W | $3.6k |
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal | $25k | 144 / 138 GB | 1008 GB/s | 225 t/s | 2200 W | $8k |
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA | $28k | 1024 / 960 GB | 819 GB/s | 150 t/s | 440 W | $1.5k |
Single AMD Instinct MI300X 192 GB workstationAMD · workstation | $30k | 192 / 188 GB | 5300 GB/s | 300 t/s | 750 W | $2.8k |
Single H100 80 GB workstationNVIDIA · workstation | $32k | 80 / 78 GB | 3350 GB/s | 270 t/s | 700 W | $2.5k |
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal | $38k | 384 / 372 GB | 1792 GB/s | 450 t/s | 2200 W | $8k |
Single H200 141 GB workstationNVIDIA · workstation / 2U server | $40k | 141 / 138 GB | 4800 GB/s | 375 t/s | 700 W | $2.5k |
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal | $40k | 192 / 184 GB | 1008 GB/s | 285 t/s | 3200 W | $11k |
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric | $44k | 1024 / 976 GB | 273 GB/s | 216 t/s | 1840 W | $7k |
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal | $45k | 128 / 124 GB | 1792 GB/s | 330 t/s | 2300 W | $8k |
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT) | $78k | 768 / 744 GB | 1792 GB/s | 660 t/s | 4800 W | $16k |
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 540 t/s | 5600 W | $20k |
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops | — | 128 / 119 GB | 300 GB/s | — | — W | $756 |
Cheapest
Single AMD Instinct MI50 32 GB (used) build
AMD · desktop tower
$700
tokens / secQ5
14B 32 t/s
30B —
70B —
Memory32 GB · 31 usable
Bandwidth1024 GB/s
Idle / Active18 W / 300 W
Sticker$700
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($700 USD).
Fastest
DGX B200 — 8× B200 server (1.44 TB HBM3e)
NVIDIA · 10U DGX server
$475,000
tokens / secQ5
14B 353 t/s
30B 227 t/s
70B 151 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Highest measured tg/s — 680 t/s on DiffusionGemma 26B-A4B-class models at Q5.
📺 Reviews on YouTube
All-rounder
Mac Studio M3 Ultra 96 GB
Apple · small desktop
$3,999
tokens / secQ5
14B 59 t/s
30B 32 t/s
70B 15 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
Single RTX 3090 (used) build
NVIDIA · desktop tower
$1,750
tokens / secQ5
14B 42 t/s
30B 24 t/s
70B —
Memory24 GB · 23 usable
Bandwidth936 GB/s
Idle / Active22 W / 350 W
Sticker$1,750
Why: Best $/tg-per-second — ~$25 per t/s.
Best CUDA
DGX H200 — 8× H200 server (1.13 TB HBM3e)
NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ5
14B 328 t/s
30B 210 t/s
70B 143 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
12× RTX Pro 6000 Blackwell rack (1152 GB)
NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ5
14B 286 t/s
30B 210 t/s
70B 143 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: 1116 GB usable — most headroom for batching and longer contexts.
Efficient
MacBook Pro M4 Pro 48 GB
Apple · laptop
$2,899
tokens / secQ5
14B 24 t/s
30B 12 t/s
70B 5.0 t/s
Memory48 GB · 40 usable
Bandwidth273 GB/s
Idle / Active5 W / 70 W
Sticker$2,899
Why: 70 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Cheapest
Single Intel Arc Pro B70 build
Intel · desktop tower
$1,800
tokens / secQ5
14B 34 t/s
30B 21 t/s
70B —
Memory32 GB · 31 usable
Bandwidth608 GB/s
Idle / Active18 W / 220 W
Sticker$1,800
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($1.8k USD).
Fastest
Single AMD Instinct MI355X 288 GB workstation
AMD · 4U server (OAM, liquid-cooled)
$28,000
tokens / secQ5
14B 227 t/s
30B 134 t/s
70B —
Memory288 GB · 282 usable
Bandwidth8000 GB/s
Idle / Active140 W / 1400 W
Sticker$28,000
Why: Highest measured tg/s — 403 t/s on DiffusionGemma 26B-A4B-class models at Q5.
All-rounder
Mac Studio M3 Ultra 96 GB
Apple · small desktop
$3,999
tokens / secQ5
14B 59 t/s
30B 32 t/s
70B 15 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
Single RTX 5090 build
NVIDIA · desktop tower
$4,900
tokens / secQ5
14B 104 t/s
30B 59 t/s
70B —
Memory32 GB · 31 usable
Bandwidth1792 GB/s
Idle / Active30 W / 520 W
Sticker$4,900
Why: Best $/tg-per-second — ~$28 per t/s.
Best CUDA
Single B200 180 GB workstation
NVIDIA · workstation / 4U server
$47,000
tokens / secQ5
14B 189 t/s
30B 113 t/s
70B 63 t/s
Memory180 GB · 176 usable
Bandwidth8000 GB/s
Idle / Active100 W / 1000 W
Sticker$47,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
Mac Studio M3 Ultra 512 GB
Apple · small desktop
$14,199
tokens / secQ5
14B 59 t/s
30B 32 t/s
70B 15 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: 480 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient
MacBook Pro M4 Pro 48 GB
Apple · laptop
$2,899
tokens / secQ5
14B 24 t/s
30B 12 t/s
70B 5.0 t/s
Memory48 GB · 40 usable
Bandwidth273 GB/s
Idle / Active5 W / 70 W
Sticker$2,899
Why: 70 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Every other build that runs DiffusionGemma 26B-A4B
53 additional builds fit DiffusionGemma 26B-A4B at Q5_K_M (21 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q5) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower | $750 | 24 / 23 GB | 347 GB/s | 30 t/s | 250 W | $854 |
Tesla V100 32 GB SXM2 mod buildNVIDIA · desktop tower | $900 | 32 / 31 GB | 900 GB/s | 28 t/s | 300 W | $1.1k |
Mac Studio M4 Max 36 GBApple · small desktop | $2.0k | 36 / 28 GB | 546 GB/s | 71 t/s | 130 W | $453 |
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower | $2.0k | 32 / 31 GB | 640 GB/s | — | 300 W | $1.1k |
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower | $2.3k | 128 / 122 GB | 1024 GB/s | 32 t/s | 1200 W | $4.2k |
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower | $2.7k | 96 / 92 GB | 347 GB/s | 20 t/s | 1000 W | $3.5k |
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop | $2.8k | 128 / 96 GB | 256 GB/s | 40 t/s | 120 W | $420 |
Dual RTX 3090 (used) buildNVIDIA · desktop tower | $3.1k | 48 / 46 GB | 936 GB/s | 88 t/s | 700 W | $2.4k |
MacBook Pro M5 Pro 48 GBApple · laptop | $3.2k | 48 / 40 GB | 307 GB/s | 45 t/s | 75 W | $263 |
Single RTX 4090 buildNVIDIA · desktop tower | $3.2k | 24 / 23 GB | 1008 GB/s | 106 t/s | 410 W | $1.4k |
Dual Intel Arc Pro B70 buildIntel · desktop tower | $3.2k | 64 / 62 GB | 608 GB/s | 88 t/s | 380 W | $1.4k |
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation | $3.5k | 32 / 31 GB | 576 GB/s | 45 t/s | 260 W | $920 |
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation | $3.7k | 64 / 62 GB | 640 GB/s | 76 t/s | 600 W | $2.1k |
MacBook Pro M4 Max 64 GBApple · laptop | $4.0k | 64 / 54 GB | 410 GB/s | 55 t/s | 90 W | $315 |
Dell Pro Max with GB10 (128 GB)Dell · small desktop | $4.1k | 128 / 119 GB | 273 GB/s | 76 t/s | 240 W | $887 |
MacBook Pro M5 Max 64 GBApple · laptop | $4.1k | 64 / 54 GB | 614 GB/s | 76 t/s | 95 W | $332 |
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 76 t/s | 240 W | $887 |
Mac Studio M4 Max 128 GBApple · small desktop | $4.7k | 128 / 112 GB | 546 GB/s | 76 t/s | 130 W | $453 |
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 76 t/s | 240 W | $887 |
ASUS Ascent GX10 (128 GB)ASUS · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 71 t/s | 240 W | $903 |
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation | $4.7k | 48 / 46 GB | 768 GB/s | 60 t/s | 300 W | $1.0k |
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation | $5k | 48 / 46 GB | 864 GB/s | 55 t/s | 295 W | $1.0k |
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop | $5k | 128 / 119 GB | 273 GB/s | 76 t/s | 160 W | $650 |
MacBook Pro M5 Max 128 GBApple · laptop | $5k | 128 / 108 GB | 614 GB/s | 76 t/s | 95 W | $332 |
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair | $6k | 256 / 192 GB | 256 GB/s | 60 t/s | 240 W | $841 |
Quad Intel Arc Pro B70 buildIntel · rack/large tower | $6k | 128 / 124 GB | 608 GB/s | 96 t/s | 700 W | $2.5k |
Quad RTX 3090 (used) buildNVIDIA · rack/large tower | $7k | 96 / 92 GB | 936 GB/s | 101 t/s | 1400 W | $4.9k |
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation | $7k | 48 / 46 GB | 1344 GB/s | — | 300 W | $1.1k |
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation | $8k | 48 / 46 GB | 960 GB/s | 101 t/s | 300 W | $1.0k |
Mac Studio M3 Ultra 256 GBApple · small desktop | $8k | 256 / 232 GB | 819 GB/s | 96 t/s | 180 W | $624 |
Dual AMD Radeon Pro W7900 buildAMD · workstation | $9k | 96 / 92 GB | 864 GB/s | 83 t/s | 600 W | $2.1k |
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect | $10k | 256 / 240 GB | 273 GB/s | 126 t/s | 460 W | $1.7k |
Dual RTX 5090 buildNVIDIA · rack/large tower | $10k | 64 / 62 GB | 1792 GB/s | 227 t/s | 1050 W | $3.6k |
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower | $11k | 256 / 248 GB | 608 GB/s | 106 t/s | 1450 W | $5k |
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric | $12k | 512 / 384 GB | 256 GB/s | 81 t/s | 480 W | $1.7k |
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation | $12k | 96 / 93 GB | 1792 GB/s | 214 t/s | 600 W | $2.1k |
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal | $15k | 144 / 138 GB | 960 GB/s | 139 t/s | 1500 W | $5k |
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops | $20k | 512 / 488 GB | 273 GB/s | 139 t/s | 920 W | $3.4k |
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric | $23k | 1024 / 768 GB | 256 GB/s | 101 t/s | 960 W | $3.4k |
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation | $24k | 192 / 188 GB | 1792 GB/s | 252 t/s | 1100 W | $3.8k |
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM) | $25k | 256 / 250 GB | 6000 GB/s | 328 t/s | 1000 W | $3.6k |
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal | $25k | 144 / 138 GB | 1008 GB/s | 189 t/s | 2200 W | $8k |
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA | $28k | 1024 / 960 GB | 819 GB/s | 126 t/s | 440 W | $1.5k |
Single AMD Instinct MI300X 192 GB workstationAMD · workstation | $30k | 192 / 188 GB | 5300 GB/s | 252 t/s | 750 W | $2.8k |
Single H100 80 GB workstationNVIDIA · workstation | $32k | 80 / 78 GB | 3350 GB/s | 227 t/s | 700 W | $2.5k |
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal | $38k | 384 / 372 GB | 1792 GB/s | 378 t/s | 2200 W | $8k |
Single H200 141 GB workstationNVIDIA · workstation / 2U server | $40k | 141 / 138 GB | 4800 GB/s | 315 t/s | 700 W | $2.5k |
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal | $40k | 192 / 184 GB | 1008 GB/s | 239 t/s | 3200 W | $11k |
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric | $44k | 1024 / 976 GB | 273 GB/s | 181 t/s | 1840 W | $7k |
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal | $45k | 128 / 124 GB | 1792 GB/s | 277 t/s | 2300 W | $8k |
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT) | $78k | 768 / 744 GB | 1792 GB/s | 554 t/s | 4800 W | $16k |
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 454 t/s | 5600 W | $20k |
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops | — | 128 / 119 GB | 300 GB/s | — | — W | $756 |
Cheapest
Quad AMD MI50 32 GB (128 GB) homelab build
AMD · rack/large tower
$2,300
tokens / secQ8
14B 26 t/s
30B 26 t/s
70B 18 t/s
Memory128 GB · 122 usable
Bandwidth1024 GB/s
Idle / Active75 W / 1200 W
Sticker$2,300
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($2.3k USD).
Fastest
DGX B200 — 8× B200 server (1.44 TB HBM3e)
NVIDIA · 10U DGX server
$475,000
tokens / secQ8
14B 218 t/s
30B 140 t/s
70B 94 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Highest measured tg/s — 421 t/s on DiffusionGemma 26B-A4B-class models at Q8.
📺 Reviews on YouTube
All-rounder
Mac Studio M3 Ultra 96 GB
Apple · small desktop
$3,999
tokens / secQ8
14B 36 t/s
30B 20 t/s
70B 9.4 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
Dual RTX 3090 (used) build
NVIDIA · desktop tower
$3,100
tokens / secQ8
14B 29 t/s
30B 18 t/s
70B 11 t/s
Memory48 GB · 46 usable
Bandwidth936 GB/s
Idle / Active45 W / 700 W
Sticker$3,100
Why: Best $/tg-per-second — ~$57 per t/s.
Best CUDA
DGX H200 — 8× H200 server (1.13 TB HBM3e)
NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ8
14B 203 t/s
30B 130 t/s
70B 88 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
12× RTX Pro 6000 Blackwell rack (1152 GB)
NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ8
14B 177 t/s
30B 130 t/s
70B 88 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: 1116 GB usable — most headroom for batching and longer contexts.
Efficient
MacBook Pro M4 Pro 48 GB
Apple · laptop
$2,899
tokens / secQ8
14B 15 t/s
30B 7.3 t/s
70B 3.1 t/s
Memory48 GB · 40 usable
Bandwidth273 GB/s
Idle / Active5 W / 70 W
Sticker$2,899
Why: 70 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Cheapest
AMD Ryzen AI Max+ 395 (128 GB)
AMD · mini desktop / laptop
$2,799
tokens / secQ8
14B 15 t/s
30B 8.3 t/s
70B 2.6 t/s
Memory128 GB · 96 usable
Bandwidth256 GB/s
Idle / Active8 W / 120 W
Sticker$2,799
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($2.8k USD).
Fastest
Single AMD Instinct MI355X 288 GB workstation
AMD · 4U server (OAM, liquid-cooled)
$28,000
tokens / secQ8
14B 140 t/s
30B 83 t/s
70B —
Memory288 GB · 282 usable
Bandwidth8000 GB/s
Idle / Active140 W / 1400 W
Sticker$28,000
Why: Highest measured tg/s — 250 t/s on DiffusionGemma 26B-A4B-class models at Q8.
All-rounder
Mac Studio M3 Ultra 96 GB
Apple · small desktop
$3,999
tokens / secQ8
14B 36 t/s
30B 20 t/s
70B 9.4 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value
Dual Intel Arc Pro B70 build
Intel · desktop tower
$3,200
tokens / secQ8
14B 29 t/s
30B 18 t/s
70B 7.3 t/s
Memory64 GB · 62 usable
Bandwidth608 GB/s
Idle / Active35 W / 380 W
Sticker$3,200
Why: Best $/tg-per-second — ~$59 per t/s.
Best CUDA
Single B200 180 GB workstation
NVIDIA · workstation / 4U server
$47,000
tokens / secQ8
14B 117 t/s
30B 70 t/s
70B 39 t/s
Memory180 GB · 176 usable
Bandwidth8000 GB/s
Idle / Active100 W / 1000 W
Sticker$47,000
Why: Strongest CUDA-only software stack among fitting builds.
📺 Reviews on YouTube
Most VRAM
Mac Studio M3 Ultra 512 GB
Apple · small desktop
$14,199
tokens / secQ8
14B 36 t/s
30B 20 t/s
70B 9.4 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: 480 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient
MacBook Pro M4 Pro 48 GB
Apple · laptop
$2,899
tokens / secQ8
14B 15 t/s
30B 7.3 t/s
70B 3.1 t/s
Memory48 GB · 40 usable
Bandwidth273 GB/s
Idle / Active5 W / 70 W
Sticker$2,899
Why: 70 W active — lowest power draw of the fitting builds.
📺 Reviews on YouTube
Every other build that runs DiffusionGemma 26B-A4B
43 additional builds fit DiffusionGemma 26B-A4B at Q8_0 (35 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q8) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower | $2.7k | 96 / 92 GB | 347 GB/s | 12 t/s | 1000 W | $3.5k |
MacBook Pro M5 Pro 48 GBApple · laptop | $3.2k | 48 / 40 GB | 307 GB/s | 28 t/s | 75 W | $263 |
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation | $3.7k | 64 / 62 GB | 640 GB/s | 47 t/s | 600 W | $2.1k |
MacBook Pro M4 Max 64 GBApple · laptop | $4.0k | 64 / 54 GB | 410 GB/s | 34 t/s | 90 W | $315 |
Dell Pro Max with GB10 (128 GB)Dell · small desktop | $4.1k | 128 / 119 GB | 273 GB/s | 47 t/s | 240 W | $887 |
MacBook Pro M5 Max 64 GBApple · laptop | $4.1k | 64 / 54 GB | 614 GB/s | 47 t/s | 95 W | $332 |
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 47 t/s | 240 W | $887 |
Mac Studio M4 Max 128 GBApple · small desktop | $4.7k | 128 / 112 GB | 546 GB/s | 47 t/s | 130 W | $453 |
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 47 t/s | 240 W | $887 |
ASUS Ascent GX10 (128 GB)ASUS · small desktop | $4.7k | 128 / 119 GB | 273 GB/s | 44 t/s | 240 W | $903 |
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation | $4.7k | 48 / 46 GB | 768 GB/s | 37 t/s | 300 W | $1.0k |
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation | $5k | 48 / 46 GB | 864 GB/s | 34 t/s | 295 W | $1.0k |
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop | $5k | 128 / 119 GB | 273 GB/s | 47 t/s | 160 W | $650 |
MacBook Pro M5 Max 128 GBApple · laptop | $5k | 128 / 108 GB | 614 GB/s | 47 t/s | 95 W | $332 |
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair | $6k | 256 / 192 GB | 256 GB/s | 37 t/s | 240 W | $841 |
Quad Intel Arc Pro B70 buildIntel · rack/large tower | $6k | 128 / 124 GB | 608 GB/s | 59 t/s | 700 W | $2.5k |
Quad RTX 3090 (used) buildNVIDIA · rack/large tower | $7k | 96 / 92 GB | 936 GB/s | 62 t/s | 1400 W | $4.9k |
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation | $7k | 48 / 46 GB | 1344 GB/s | — | 300 W | $1.1k |
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation | $8k | 48 / 46 GB | 960 GB/s | 62 t/s | 300 W | $1.0k |
Mac Studio M3 Ultra 256 GBApple · small desktop | $8k | 256 / 232 GB | 819 GB/s | 59 t/s | 180 W | $624 |
Dual AMD Radeon Pro W7900 buildAMD · workstation | $9k | 96 / 92 GB | 864 GB/s | 51 t/s | 600 W | $2.1k |
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect | $10k | 256 / 240 GB | 273 GB/s | 78 t/s | 460 W | $1.7k |
Dual RTX 5090 buildNVIDIA · rack/large tower | $10k | 64 / 62 GB | 1792 GB/s | 140 t/s | 1050 W | $3.6k |
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower | $11k | 256 / 248 GB | 608 GB/s | 66 t/s | 1450 W | $5k |
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric | $12k | 512 / 384 GB | 256 GB/s | 50 t/s | 480 W | $1.7k |
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation | $12k | 96 / 93 GB | 1792 GB/s | 133 t/s | 600 W | $2.1k |
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal | $15k | 144 / 138 GB | 960 GB/s | 86 t/s | 1500 W | $5k |
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops | $20k | 512 / 488 GB | 273 GB/s | 86 t/s | 920 W | $3.4k |
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric | $23k | 1024 / 768 GB | 256 GB/s | 62 t/s | 960 W | $3.4k |
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation | $24k | 192 / 188 GB | 1792 GB/s | 156 t/s | 1100 W | $3.8k |
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM) | $25k | 256 / 250 GB | 6000 GB/s | 203 t/s | 1000 W | $3.6k |
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal | $25k | 144 / 138 GB | 1008 GB/s | 117 t/s | 2200 W | $8k |
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA | $28k | 1024 / 960 GB | 819 GB/s | 78 t/s | 440 W | $1.5k |
Single AMD Instinct MI300X 192 GB workstationAMD · workstation | $30k | 192 / 188 GB | 5300 GB/s | 156 t/s | 750 W | $2.8k |
Single H100 80 GB workstationNVIDIA · workstation | $32k | 80 / 78 GB | 3350 GB/s | 140 t/s | 700 W | $2.5k |
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal | $38k | 384 / 372 GB | 1792 GB/s | 234 t/s | 2200 W | $8k |
Single H200 141 GB workstationNVIDIA · workstation / 2U server | $40k | 141 / 138 GB | 4800 GB/s | 195 t/s | 700 W | $2.5k |
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal | $40k | 192 / 184 GB | 1008 GB/s | 148 t/s | 3200 W | $11k |
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric | $44k | 1024 / 976 GB | 273 GB/s | 112 t/s | 1840 W | $7k |
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal | $45k | 128 / 124 GB | 1792 GB/s | 172 t/s | 2300 W | $8k |
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT) | $78k | 768 / 744 GB | 1792 GB/s | 343 t/s | 4800 W | $16k |
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 281 t/s | 5600 W | $20k |
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops | — | 128 / 119 GB | 300 GB/s | — | — W | $756 |
Sources
Last updated 2026-06-13