All models

Hardware to run DiffusionGemma 26B-A4B

Jun 10 2026. Google's experimental text-diffusion Gemma — 25.2 B MoE (3.8 B active) that denoises 256-token blocks in parallel: 1,000+ tok/s on H100, 700+ on a 5090, and it runs on 18 GB cards quantized. Trades benchmark quality for ~4× generation speed vs Gemma 4 26B-A4B.

Gemma · text
DiffusionGemma 26B-A4B
26 B params 14 GB Q4 file 18 GB min Q4 21 GB min Q5 35 GB min Q8 256K ctx Apache 2.0 🤗
switch in the live picker →
Quantization
Availability
Cheapest

Single AMD Instinct MI50 32 GB (used) build

AMD · desktop tower
$700
tokens / secQ4
14B 38 t/s
30B
70B
Memory32 GB · 31 usable
Bandwidth1024 GB/s
Idle / Active18 W / 300 W
Sticker$700
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($700 USD).
Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server
$475,000
tokens / secQ4
14B 420 t/s
30B 270 t/s
70B 180 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Highest measured tg/s — 810 t/s on DiffusionGemma 26B-A4B-class models at Q4.
All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop
$3,999
tokens / secQ4
14B 70 t/s
30B 38 t/s
70B 18 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value

Single RTX 3090 (used) build

NVIDIA · desktop tower
$1,750
tokens / secQ4
14B 50 t/s
30B 28 t/s
70B
Memory24 GB · 23 usable
Bandwidth936 GB/s
Idle / Active22 W / 350 W
Sticker$1,750
Why: Best $/tg-per-second — ~$21 per t/s.
Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ4
14B 390 t/s
30B 250 t/s
70B 170 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: Strongest CUDA-only software stack among fitting builds.
Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ4
14B 340 t/s
30B 250 t/s
70B 170 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: 1116 GB usable — most headroom for batching and longer contexts.
Efficient

Mac Mini M4 (24 GB)

Apple · mini desktop
$999
tokens / secQ4
14B 12 t/s
30B
70B
Memory24 GB · 18 usable
Bandwidth120 GB/s
Idle / Active4 W / 50 W
Sticker$999
Why: 50 W active — lowest power draw of the fitting builds.
Cheapest

Mac Mini M4 (24 GB)

Apple · mini desktop
$999
tokens / secQ4
14B 12 t/s
30B
70B
Memory24 GB · 18 usable
Bandwidth120 GB/s
Idle / Active4 W / 50 W
Sticker$999
Why: Lowest sticker that still fits DiffusionGemma 26B-A4B ($999 USD).
Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)
$28,000
tokens / secQ4
14B 270 t/s
30B 160 t/s
70B
Memory288 GB · 282 usable
Bandwidth8000 GB/s
Idle / Active140 W / 1400 W
Sticker$28,000
Why: Highest measured tg/s — 480 t/s on DiffusionGemma 26B-A4B-class models at Q4.
All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop
$3,999
tokens / secQ4
14B 70 t/s
30B 38 t/s
70B 18 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$3,999
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value

Single RTX 5090 build

NVIDIA · desktop tower
$4,900
tokens / secQ4
14B 124 t/s
30B 70 t/s
70B
Memory32 GB · 31 usable
Bandwidth1792 GB/s
Idle / Active30 W / 520 W
Sticker$4,900
Why: Best $/tg-per-second — ~$23 per t/s.
Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server
$47,000
tokens / secQ4
14B 225 t/s
30B 135 t/s
70B 75 t/s
Memory180 GB · 176 usable
Bandwidth8000 GB/s
Idle / Active100 W / 1000 W
Sticker$47,000
Why: Strongest CUDA-only software stack among fitting builds.
Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop
$14,199
tokens / secQ4
14B 70 t/s
30B 38 t/s
70B 18 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: 480 GB usable — most headroom for batching and longer contexts.
📺 Reviews on YouTube
Efficient

MacBook Pro M4 Pro 48 GB

Apple · laptop
$2,899
tokens / secQ4
14B 28 t/s
30B 14 t/s
70B 6.0 t/s
Memory48 GB · 40 usable
Bandwidth273 GB/s
Idle / Active5 W / 70 W
Sticker$2,899
Why: 70 W active — lowest power draw of the fitting builds.

Every other build that runs DiffusionGemma 26B-A4B

54 additional builds fit DiffusionGemma 26B-A4B at Q4_K_M (18 GB usable minimum), sorted by sticker price.

BuildPriceMemoryBandwidthtg/s (Q4)Active W5-yr power
$75024 / 23 GB347 GB/s36 t/s250 W$854
Tesla V100 32 GB SXM2 mod buildNVIDIA · desktop tower
$90032 / 31 GB900 GB/s33 t/s300 W$1.1k
Single Intel Arc Pro B70 buildIntel · desktop tower
$1.8k32 / 31 GB608 GB/s75 t/s220 W$782
Mac Studio M4 Max 36 GBApple · small desktop
$2.0k36 / 28 GB546 GB/s84 t/s130 W$453
$2.0k32 / 31 GB640 GB/s300 W$1.1k
$2.3k128 / 122 GB1024 GB/s38 t/s1200 W$4.2k
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower
$2.7k96 / 92 GB347 GB/s24 t/s1000 W$3.5k
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop
$2.8k128 / 96 GB256 GB/s48 t/s120 W$420
Dual RTX 3090 (used) buildNVIDIA · desktop tower
$3.1k48 / 46 GB936 GB/s105 t/s700 W$2.4k
$3.2k48 / 40 GB307 GB/s54 t/s75 W$263
Single RTX 4090 buildNVIDIA · desktop tower
$3.2k24 / 23 GB1008 GB/s126 t/s410 W$1.4k
Dual Intel Arc Pro B70 buildIntel · desktop tower
$3.2k64 / 62 GB608 GB/s105 t/s380 W$1.4k
$3.5k32 / 31 GB576 GB/s54 t/s260 W$920
$3.7k64 / 62 GB640 GB/s90 t/s600 W$2.1k
$4.0k64 / 54 GB410 GB/s66 t/s90 W$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop
$4.1k128 / 119 GB273 GB/s90 t/s240 W$887
$4.1k64 / 54 GB614 GB/s90 t/s95 W$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop
$4.7k128 / 119 GB273 GB/s90 t/s240 W$887
Mac Studio M4 Max 128 GBApple · small desktop
$4.7k128 / 112 GB546 GB/s90 t/s130 W$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop
$4.7k128 / 119 GB273 GB/s90 t/s240 W$887
ASUS Ascent GX10 (128 GB)ASUS · small desktop
$4.7k128 / 119 GB273 GB/s84 t/s240 W$903
$4.7k48 / 46 GB768 GB/s72 t/s300 W$1.0k
$5k48 / 46 GB864 GB/s66 t/s295 W$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop
$5k128 / 119 GB273 GB/s90 t/s160 W$650
$5k128 / 108 GB614 GB/s90 t/s95 W$332
$6k256 / 192 GB256 GB/s72 t/s240 W$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower
$6k128 / 124 GB608 GB/s114 t/s700 W$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower
$7k96 / 92 GB936 GB/s120 t/s1400 W$4.9k
$7k48 / 46 GB1344 GB/s300 W$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation
$8k48 / 46 GB960 GB/s120 t/s300 W$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop
$8k256 / 232 GB819 GB/s114 t/s180 W$624
$9k96 / 92 GB864 GB/s99 t/s600 W$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect
$10k256 / 240 GB273 GB/s150 t/s460 W$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower
$10k64 / 62 GB1792 GB/s270 t/s1050 W$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower
$11k256 / 248 GB608 GB/s126 t/s1450 W$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric
$12k512 / 384 GB256 GB/s96 t/s480 W$1.7k
$12k96 / 93 GB1792 GB/s255 t/s600 W$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal
$15k144 / 138 GB960 GB/s165 t/s1500 W$5k
$20k512 / 488 GB273 GB/s165 t/s920 W$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric
$23k1024 / 768 GB256 GB/s120 t/s960 W$3.4k
$24k192 / 188 GB1792 GB/s300 t/s1100 W$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)
$25k256 / 250 GB6000 GB/s390 t/s1000 W$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal
$25k144 / 138 GB1008 GB/s225 t/s2200 W$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA
$28k1024 / 960 GB819 GB/s150 t/s440 W$1.5k
$30k192 / 188 GB5300 GB/s300 t/s750 W$2.8k
Single H100 80 GB workstationNVIDIA · workstation
$32k80 / 78 GB3350 GB/s270 t/s700 W$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal
$38k384 / 372 GB1792 GB/s450 t/s2200 W$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server
$40k141 / 138 GB4800 GB/s375 t/s700 W$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal
$40k192 / 184 GB1008 GB/s285 t/s3200 W$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric
$44k1024 / 976 GB273 GB/s216 t/s1840 W$7k
$45k128 / 124 GB1792 GB/s330 t/s2300 W$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)
$78k768 / 744 GB1792 GB/s660 t/s4800 W$16k
8× H100 80 GB serverNVIDIA · server rack
$280k640 / 620 GB3350 GB/s540 t/s5600 W$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops
128 / 119 GB300 GB/s— W$756
Open in the live picker (Q2 / Q5 / Q8 toggles) → Compare DiffusionGemma 26B-A4B against other LLMs → Pick LLMs for your hardware → Submit a benchmark for DiffusionGemma 26B-A4B ↗

Sources

Last updated 2026-06-13