All models

Hardware to run Gemma 4 12B Unified (dense)

Jun 3 2026. Google's encoder-free 12B dense — unified decoder-only transformer with no separate vision/audio encoder; raw patches + audio waveforms project directly into embedding space. 256K context, 140+ langs, native multimodal (text/image/audio/video), Apache 2.0. Runs on a 16GB laptop (~8-9GB Q4). Strong for its size: AIME 77.5, GPQA 78.8, MMLU-Pro 77.2, LCB 72.0.

Gemma · text
Gemma 4 12B Unified (dense)
12 B params 7 GB Q4 file 8 GB min Q4 10 GB min Q5 15 GB min Q8 256K ctx Apache 2.0 🤗
switch in the live picker →
Quantization
Availability
Cheapest

Single Tesla P100 16 GB (used) build

NVIDIA · desktop tower
$500
tokens / secQ4
8B 33 t/s
14B 19 t/s
30B
Memory16 GB · 15 usable
Bandwidth732 GB/s
Idle / Active25 W / 250 W
Sticker$500
Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($500 USD).
Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server
$475,000
tokens / secQ4
8B 600 t/s
14B 420 t/s
30B 270 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Highest measured tg/s — 490 t/s on Gemma 4 12B Unified (dense)-class models at Q4.
All-rounder

Tesla V100 32 GB SXM2 mod build

NVIDIA · desktop tower
$900
tokens / secQ4
8B 85 t/s
14B 50 t/s
30B 27 t/s
Memory32 GB · 31 usable
Bandwidth900 GB/s
Idle / Active33 W / 300 W
Sticker$900
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value

Single AMD Instinct MI50 32 GB (used) build

AMD · desktop tower
$900
tokens / secQ4
8B 71 t/s
14B 38 t/s
30B
Memory32 GB · 31 usable
Bandwidth1024 GB/s
Idle / Active18 W / 300 W
Sticker$900
Why: Best $/tg-per-second — ~$20 per t/s.
Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ4
8B 560 t/s
14B 390 t/s
30B 250 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: Strongest CUDA-only software stack among fitting builds.
Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ4
8B 460 t/s
14B 340 t/s
30B 250 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: 1116 GB usable — most headroom for batching and longer contexts.
Efficient

MacBook Air M4 (16 GB)

Apple · laptop
$1,099
tokens / secQ4
8B 18 t/s
14B 8.0 t/s
30B
Memory16 GB · 11 usable
Bandwidth120 GB/s
Idle / Active5 W / 30 W
Sticker$1,099
Why: 30 W active — lowest power draw of the fitting builds.
Cheapest

Mac Mini M4 (16 GB)

Apple · mini desktop
$799
tokens / secQ4
8B 22 t/s
14B 10 t/s
30B
Memory16 GB · 11 usable
Bandwidth120 GB/s
Idle / Active4 W / 50 W
Sticker$799
Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($799 USD).
Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)
$28,000
tokens / secQ4
8B 420 t/s
14B 270 t/s
30B 160 t/s
Memory288 GB · 282 usable
Bandwidth8000 GB/s
Idle / Active140 W / 1400 W
Sticker$28,000
Why: Highest measured tg/s — 315 t/s on Gemma 4 12B Unified (dense)-class models at Q4.
All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop
$5,299
tokens / secQ4
8B 110 t/s
14B 70 t/s
30B 38 t/s
Memory96 GB · 80 usable
Bandwidth819 GB/s
Idle / Active10 W / 180 W
Sticker$5,299
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value

Single AMD Radeon RX 9070 XT 16 GB build

AMD · desktop tower
$1,300
tokens / secQ4
8B 78 t/s
14B 44 t/s
30B
Memory16 GB · 15 usable
Bandwidth645 GB/s
Idle / Active17 W / 304 W
Sticker$1,300
Why: Best $/tg-per-second — ~$25 per t/s.
Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server
$47,000
tokens / secQ4
8B 360 t/s
14B 225 t/s
30B 135 t/s
Memory180 GB · 176 usable
Bandwidth8000 GB/s
Idle / Active100 W / 1000 W
Sticker$47,000
Why: Strongest CUDA-only software stack among fitting builds.
Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop
$14,199
tokens / secQ4
8B 110 t/s
14B 70 t/s
30B 38 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: 480 GB usable — most headroom for batching and longer contexts.
Efficient

MacBook Air M4 (16 GB)

Apple · laptop
$1,099
tokens / secQ4
8B 18 t/s
14B 8.0 t/s
30B
Memory16 GB · 11 usable
Bandwidth120 GB/s
Idle / Active5 W / 30 W
Sticker$1,099
Why: 30 W active — lowest power draw of the fitting builds.

Every other build that runs Gemma 4 12B Unified (dense)

60 additional builds fit Gemma 4 12B Unified (dense) at Q4_K_M (8 GB usable minimum), sorted by sticker price.

BuildPriceMemoryBandwidthtg/s (Q4)Active W5-yr power
$75024 / 23 GB347 GB/s18 t/s250 W$854
RTX 3060 12 GB buildNVIDIA · desktop tower
$90012 / 11 GB360 GB/s21 t/s170 W$598
Mac Mini M4 (24 GB)Apple · mini desktop
$99924 / 18 GB120 GB/s14 t/s50 W$177
Single Intel Arc B580 12 GB buildIntel · desktop tower
$1.1k12 / 11 GB456 GB/s26 t/s190 W$664
MacBook Air M5 (16 GB)Apple · laptop
$1.3k16 / 11 GB153 GB/s12 t/s30 W$115
Single RTX 3090 (used) buildNVIDIA · desktop tower
$1.5k24 / 23 GB936 GB/s58 t/s350 W$1.2k
Single Intel Arc Pro B70 buildIntel · desktop tower
$1.8k32 / 31 GB608 GB/s47 t/s220 W$782
$2.0k32 / 31 GB640 GB/s53 t/s300 W$1.1k
Mac Studio M4 Max 36 GBApple · small desktop
$2.5k36 / 28 GB546 GB/s64 t/s130 W$453
$2.5k128 / 122 GB1024 GB/s44 t/s1200 W$4.2k
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower
$2.7k96 / 92 GB347 GB/s19 t/s1000 W$3.5k
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop
$2.8k128 / 96 GB256 GB/s33 t/s120 W$420
Dual RTX 3090 (used) buildNVIDIA · desktop tower
$2.8k48 / 46 GB936 GB/s64 t/s700 W$2.4k
$2.9k48 / 40 GB273 GB/s33 t/s70 W$246
$3.2k48 / 40 GB307 GB/s41 t/s75 W$263
Single RTX 4090 buildNVIDIA · desktop tower
$3.2k24 / 23 GB1008 GB/s88 t/s410 W$1.4k
Dual Intel Arc Pro B70 buildIntel · desktop tower
$3.2k64 / 62 GB608 GB/s64 t/s380 W$1.4k
ASUS Ascent GX10 (128 GB)ASUS · small desktop
$3.5k128 / 119 GB273 GB/s61 t/s240 W$903
$3.5k32 / 31 GB576 GB/s37 t/s260 W$920
$3.7k64 / 62 GB640 GB/s64 t/s600 W$2.1k
$4.0k64 / 54 GB410 GB/s47 t/s90 W$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop
$4.1k128 / 119 GB273 GB/s64 t/s240 W$887
$4.1k64 / 54 GB614 GB/s58 t/s95 W$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop
$4.7k128 / 119 GB273 GB/s64 t/s240 W$887
Mac Studio M4 Max 128 GBApple · small desktop
$4.7k128 / 112 GB546 GB/s64 t/s130 W$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop
$4.7k128 / 119 GB273 GB/s64 t/s240 W$887
$4.7k48 / 46 GB768 GB/s49 t/s300 W$1.0k
Single RTX 5090 buildNVIDIA · desktop tower
$4.9k32 / 31 GB1792 GB/s145 t/s520 W$1.8k
$5k48 / 46 GB864 GB/s47 t/s295 W$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop
$5k128 / 119 GB273 GB/s64 t/s160 W$650
$5k128 / 108 GB614 GB/s58 t/s95 W$332
$6k256 / 192 GB256 GB/s47 t/s240 W$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower
$6k128 / 124 GB608 GB/s70 t/s700 W$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower
$6k96 / 92 GB936 GB/s70 t/s1400 W$4.9k
$7k48 / 46 GB1344 GB/s300 W$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation
$8k48 / 46 GB960 GB/s76 t/s300 W$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop
$8k256 / 232 GB819 GB/s82 t/s180 W$624
$9k96 / 92 GB864 GB/s64 t/s600 W$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect
$10k256 / 240 GB273 GB/s105 t/s460 W$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower
$10k64 / 62 GB1792 GB/s163 t/s1050 W$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower
$11k256 / 248 GB608 GB/s76 t/s1450 W$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric
$12k512 / 384 GB256 GB/s64 t/s480 W$1.7k
$12k96 / 93 GB1792 GB/s163 t/s600 W$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal
$15k144 / 138 GB960 GB/s111 t/s1500 W$5k
$20k512 / 488 GB273 GB/s117 t/s920 W$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric
$23k1024 / 768 GB256 GB/s82 t/s960 W$3.4k
$24k192 / 188 GB1792 GB/s187 t/s1100 W$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)
$25k256 / 250 GB6000 GB/s257 t/s1000 W$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal
$25k144 / 138 GB1008 GB/s152 t/s2200 W$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA
$28k1024 / 960 GB819 GB/s107 t/s440 W$1.5k
$30k192 / 188 GB5300 GB/s198 t/s750 W$2.8k
Single H100 80 GB workstationNVIDIA · workstation
$32k80 / 78 GB3350 GB/s175 t/s700 W$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal
$38k384 / 372 GB1792 GB/s268 t/s2200 W$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server
$40k141 / 138 GB4800 GB/s245 t/s700 W$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal
$40k192 / 184 GB1008 GB/s187 t/s3200 W$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric
$44k1024 / 976 GB273 GB/s152 t/s1840 W$7k
$45k128 / 124 GB1792 GB/s198 t/s2300 W$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)
$78k768 / 744 GB1792 GB/s362 t/s4800 W$16k
8× H100 80 GB serverNVIDIA · server rack
$280k640 / 620 GB3350 GB/s327 t/s5600 W$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops
128 / 119 GB300 GB/s— W$756
Open in the live picker (Q2 / Q5 / Q8 toggles) → Compare Gemma 4 12B Unified (dense) against other LLMs → Pick LLMs for your hardware → Submit a benchmark for Gemma 4 12B Unified (dense) ↗

Sources

Last updated 2026-06-27