Hardware to run Gemma 4 12B Unified (dense)

Jun 3 2026. Google's encoder-free 12B dense — unified decoder-only transformer with no separate vision/audio encoder; raw patches + audio waveforms project directly into embedding space. 256K context, 140+ langs, native multimodal (text/image/audio/video), Apache 2.0. Runs on a 16GB laptop (~8-9GB Q4). Strong for its size: AIME 77.5, GPQA 78.8, MMLU-Pro 77.2, LCB 72.0.

Gemma · text

Gemma 4 12B Unified (dense)

12 B params 7 GB Q4 file 8 GB min Q4 10 GB min Q5 15 GB min Q8 256K ctx Apache 2.0 🤗

switch in the live picker →

Quantization

Cheapest

Single Tesla P100 16 GB (used) build

NVIDIA · desktop tower

$500

tokens / secQ2

8B 40 t/s

14B 23 t/s

30B —

Memory16 GB · 15 usable

Bandwidth732 GB/s

Idle / Active25 W / 250 W

Sticker$500

Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($500 USD).

Amazon ↗eBay (used) ↗Newegg ↗

📺 Reviews on YouTube

▶ Can a 10-Year-Old $5,700 GPU Beat a New $430 GPU? | Tesla P100 Local AI Review

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ2

8B 720 t/s

14B 504 t/s

30B 324 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 588 t/s on Gemma 4 12B Unified (dense)-class models at Q2.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Tesla V100 32 GB SXM2 mod build

NVIDIA · desktop tower

$900

tokens / secQ2

8B 102 t/s

14B 60 t/s

30B 32 t/s

Memory32 GB · 31 usable

Bandwidth900 GB/s

Idle / Active33 W / 300 W

Sticker$900

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗eBay V100 SXM2 32 GB ↗eBay SXM2 → PCIe adapter ↗

📺 Reviews on YouTube

▶ Expensive RTX 5090 for LLMs? No — use V100 SXM2 + Z8 G4 instead

▶ I built my own AI — A.I.D.E.N. (32 GB V100 SXM2 + HP Z8 G4)

Best value

Single AMD Instinct MI50 32 GB (used) build

AMD · desktop tower

$900

tokens / secQ2

8B 85 t/s

14B 46 t/s

30B —

Memory32 GB · 31 usable

Bandwidth1024 GB/s

Idle / Active18 W / 300 W

Sticker$900

Why: Best $/tg-per-second — ~$17 per t/s.

Amazon ↗eBay (used) ↗Alibaba ↗

📺 Reviews on YouTube

▶ AMD MI50 32 GB speed test — Ollama vs llama.cpp (GPT-OSS & Qwen3)

▶ Is the Radeon Instinct MI50 32 GB the ultimate cheap home-AI GPU?

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ2

8B 672 t/s

14B 468 t/s

30B 300 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ2

8B 552 t/s

14B 408 t/s

30B 300 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

MacBook Air M4 (16 GB)

Apple · laptop

$1,099

tokens / secQ2

8B 22 t/s

14B 9.6 t/s

30B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active5 W / 30 W

Sticker$1,099

Why: 30 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ The budget MacBook so stubborn it survived a 44k-token test

Cheapest

Mac Mini M4 (16 GB)

Apple · mini desktop

$799

tokens / secQ2

8B 26 t/s

14B 12 t/s

30B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active4 W / 50 W

Sticker$799

Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($799 USD).

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ Mac Mini M4 vs M3 Pro, 6700XT, 3080Ti — LLM Ollama side by side

▶ Intel Macbook vs Apple Silicon (M1, M3 Pro, M4) running LLM Ollama

▶ Ollama Mac MLX is here — 2× faster t/s for Apple Silicon

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ2

8B 504 t/s

14B 324 t/s

30B 192 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 378 t/s on Gemma 4 12B Unified (dense)-class models at Q2.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

📺 Reviews on YouTube

▶ AMD's MI350/355X Advancing AI Event Recap

▶ AMD AI Event w/ Craft Computing — MI350/355X Launch, ROCm 7

▶ AMD Visit and Tour! Featuring ROCm 7 and AMD Instinct

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$5,299

tokens / secQ2

8B 132 t/s

14B 84 t/s

30B 46 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$5,299

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single AMD Radeon RX 9070 XT 16 GB build

AMD · desktop tower

$1,300

tokens / secQ2

8B 94 t/s

14B 53 t/s

30B —

Memory16 GB · 15 usable

Bandwidth645 GB/s

Idle / Active17 W / 304 W

Sticker$1,300

Why: Best $/tg-per-second — ~$21 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗AMD ↗

📺 Reviews on YouTube

▶ AMD 9070 XT FULL SEND — Ollama / llama.cpp / vLLM, gpt-oss / qwen3 / trinity / devstral all tested

▶ RX 9070 XT is cheaper AND better… mostly

Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server

$47,000

tokens / secQ2

8B 432 t/s

14B 270 t/s

30B 162 t/s

Memory180 GB · 176 usable

Bandwidth8000 GB/s

Idle / Active100 W / 1000 W

Sticker$47,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA B200 partners ↗SHI enterprise ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ AI Lab: NVIDIA B200 vs GB200 explained | GPU architecture for LLMs

▶ Inside a NEW AI Cluster — Tour with NVIDIA B200

Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ2

8B 132 t/s

14B 84 t/s

30B 46 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 480 GB usable — most headroom for batching and longer contexts.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

▶ M3 Ultra vs RTX 5090 — The Final Battle

Efficient

MacBook Air M4 (16 GB)

Apple · laptop

$1,099

tokens / secQ2

8B 22 t/s

14B 9.6 t/s

30B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active5 W / 30 W

Sticker$1,099

Why: 30 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ The budget MacBook so stubborn it survived a 44k-token test

Every other build that runs Gemma 4 12B Unified (dense)

60 additional builds fit Gemma 4 12B Unified (dense) at Q2_K (4 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q2)	Active W	5-yr power
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower	$750	24 / 23 GB	347 GB/s	21 t/s	250 W	$854
RTX 3060 12 GB buildNVIDIA · desktop tower	$900	12 / 11 GB	360 GB/s	25 t/s	170 W	$598
Mac Mini M4 (24 GB)Apple · mini desktop	$999	24 / 18 GB	120 GB/s	17 t/s	50 W	$177
Single Intel Arc B580 12 GB buildIntel · desktop tower	$1.1k	12 / 11 GB	456 GB/s	31 t/s	190 W	$664
MacBook Air M5 (16 GB)Apple · laptop	$1.3k	16 / 11 GB	153 GB/s	14 t/s	30 W	$115
Single RTX 3090 (used) buildNVIDIA · desktop tower	$1.5k	24 / 23 GB	936 GB/s	70 t/s	350 W	$1.2k
Single Intel Arc Pro B70 buildIntel · desktop tower	$1.8k	32 / 31 GB	608 GB/s	56 t/s	220 W	$782
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower	$2.0k	32 / 31 GB	640 GB/s	63 t/s	300 W	$1.1k
Mac Studio M4 Max 36 GBApple · small desktop	$2.5k	36 / 28 GB	546 GB/s	77 t/s	130 W	$453
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower	$2.5k	128 / 122 GB	1024 GB/s	53 t/s	1200 W	$4.2k
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower	$2.7k	96 / 92 GB	347 GB/s	22 t/s	1000 W	$3.5k
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop	$2.8k	128 / 96 GB	256 GB/s	39 t/s	120 W	$420
Dual RTX 3090 (used) buildNVIDIA · desktop tower	$2.8k	48 / 46 GB	936 GB/s	77 t/s	700 W	$2.4k
MacBook Pro M4 Pro 48 GBApple · laptop	$2.9k	48 / 40 GB	273 GB/s	39 t/s	70 W	$246
MacBook Pro M5 Pro 48 GBApple · laptop	$3.2k	48 / 40 GB	307 GB/s	49 t/s	75 W	$263
Single RTX 4090 buildNVIDIA · desktop tower	$3.2k	24 / 23 GB	1008 GB/s	105 t/s	410 W	$1.4k
Dual Intel Arc Pro B70 buildIntel · desktop tower	$3.2k	64 / 62 GB	608 GB/s	77 t/s	380 W	$1.4k
ASUS Ascent GX10 (128 GB)ASUS · small desktop	$3.5k	128 / 119 GB	273 GB/s	73 t/s	240 W	$903
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation	$3.5k	32 / 31 GB	576 GB/s	45 t/s	260 W	$920
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation	$3.7k	64 / 62 GB	640 GB/s	77 t/s	600 W	$2.1k
MacBook Pro M4 Max 64 GBApple · laptop	$4.0k	64 / 54 GB	410 GB/s	56 t/s	90 W	$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop	$4.1k	128 / 119 GB	273 GB/s	77 t/s	240 W	$887
MacBook Pro M5 Max 64 GBApple · laptop	$4.1k	64 / 54 GB	614 GB/s	70 t/s	95 W	$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop	$4.7k	128 / 119 GB	273 GB/s	77 t/s	240 W	$887
Mac Studio M4 Max 128 GBApple · small desktop	$4.7k	128 / 112 GB	546 GB/s	77 t/s	130 W	$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop	$4.7k	128 / 119 GB	273 GB/s	77 t/s	240 W	$887
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation	$4.7k	48 / 46 GB	768 GB/s	59 t/s	300 W	$1.0k
Single RTX 5090 buildNVIDIA · desktop tower	$4.9k	32 / 31 GB	1792 GB/s	174 t/s	520 W	$1.8k
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation	$5k	48 / 46 GB	864 GB/s	56 t/s	295 W	$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop	$5k	128 / 119 GB	273 GB/s	77 t/s	160 W	$650
MacBook Pro M5 Max 128 GBApple · laptop	$5k	128 / 108 GB	614 GB/s	70 t/s	95 W	$332
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair	$6k	256 / 192 GB	256 GB/s	56 t/s	240 W	$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower	$6k	128 / 124 GB	608 GB/s	84 t/s	700 W	$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower	$6k	96 / 92 GB	936 GB/s	84 t/s	1400 W	$4.9k
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation	$7k	48 / 46 GB	1344 GB/s	—	300 W	$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation	$8k	48 / 46 GB	960 GB/s	91 t/s	300 W	$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop	$8k	256 / 232 GB	819 GB/s	98 t/s	180 W	$624
Dual AMD Radeon Pro W7900 buildAMD · workstation	$9k	96 / 92 GB	864 GB/s	77 t/s	600 W	$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect	$10k	256 / 240 GB	273 GB/s	126 t/s	460 W	$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower	$10k	64 / 62 GB	1792 GB/s	196 t/s	1050 W	$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower	$11k	256 / 248 GB	608 GB/s	91 t/s	1450 W	$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric	$12k	512 / 384 GB	256 GB/s	77 t/s	480 W	$1.7k
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation	$12k	96 / 93 GB	1792 GB/s	196 t/s	600 W	$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal	$15k	144 / 138 GB	960 GB/s	133 t/s	1500 W	$5k
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	140 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	98 t/s	960 W	$3.4k
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation	$24k	192 / 188 GB	1792 GB/s	224 t/s	1100 W	$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)	$25k	256 / 250 GB	6000 GB/s	308 t/s	1000 W	$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal	$25k	144 / 138 GB	1008 GB/s	182 t/s	2200 W	$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	129 t/s	440 W	$1.5k
Single AMD Instinct MI300X 192 GB workstationAMD · workstation	$30k	192 / 188 GB	5300 GB/s	238 t/s	750 W	$2.8k
Single H100 80 GB workstationNVIDIA · workstation	$32k	80 / 78 GB	3350 GB/s	210 t/s	700 W	$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	322 t/s	2200 W	$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server	$40k	141 / 138 GB	4800 GB/s	294 t/s	700 W	$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal	$40k	192 / 184 GB	1008 GB/s	224 t/s	3200 W	$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	182 t/s	1840 W	$7k
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal	$45k	128 / 124 GB	1792 GB/s	238 t/s	2300 W	$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)	$78k	768 / 744 GB	1792 GB/s	434 t/s	4800 W	$16k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	392 t/s	5600 W	$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops	—	128 / 119 GB	300 GB/s	—	— W	$756

Cheapest

Single Tesla P100 16 GB (used) build

NVIDIA · desktop tower

$500

tokens / secQ4

8B 33 t/s

14B 19 t/s

30B —

Memory16 GB · 15 usable

Bandwidth732 GB/s

Idle / Active25 W / 250 W

Sticker$500

Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($500 USD).

Amazon ↗eBay (used) ↗Newegg ↗

📺 Reviews on YouTube

▶ Can a 10-Year-Old $5,700 GPU Beat a New $430 GPU? | Tesla P100 Local AI Review

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ4

8B 600 t/s

14B 420 t/s

30B 270 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 490 t/s on Gemma 4 12B Unified (dense)-class models at Q4.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Tesla V100 32 GB SXM2 mod build

NVIDIA · desktop tower

$900

tokens / secQ4

8B 85 t/s

14B 50 t/s

30B 27 t/s

Memory32 GB · 31 usable

Bandwidth900 GB/s

Idle / Active33 W / 300 W

Sticker$900

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗eBay V100 SXM2 32 GB ↗eBay SXM2 → PCIe adapter ↗

📺 Reviews on YouTube

▶ Expensive RTX 5090 for LLMs? No — use V100 SXM2 + Z8 G4 instead

▶ I built my own AI — A.I.D.E.N. (32 GB V100 SXM2 + HP Z8 G4)

Best value

Single AMD Instinct MI50 32 GB (used) build

AMD · desktop tower

$900

tokens / secQ4

8B 71 t/s

14B 38 t/s

30B —

Memory32 GB · 31 usable

Bandwidth1024 GB/s

Idle / Active18 W / 300 W

Sticker$900

Why: Best $/tg-per-second — ~$20 per t/s.

Amazon ↗eBay (used) ↗Alibaba ↗

📺 Reviews on YouTube

▶ AMD MI50 32 GB speed test — Ollama vs llama.cpp (GPT-OSS & Qwen3)

▶ Is the Radeon Instinct MI50 32 GB the ultimate cheap home-AI GPU?

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ4

8B 560 t/s

14B 390 t/s

30B 250 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ4

8B 460 t/s

14B 340 t/s

30B 250 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

MacBook Air M4 (16 GB)

Apple · laptop

$1,099

tokens / secQ4

8B 18 t/s

14B 8.0 t/s

30B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active5 W / 30 W

Sticker$1,099

Why: 30 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ The budget MacBook so stubborn it survived a 44k-token test

Cheapest

Mac Mini M4 (16 GB)

Apple · mini desktop

$799

tokens / secQ4

8B 22 t/s

14B 10 t/s

30B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active4 W / 50 W

Sticker$799

Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($799 USD).

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ Mac Mini M4 vs M3 Pro, 6700XT, 3080Ti — LLM Ollama side by side

▶ Intel Macbook vs Apple Silicon (M1, M3 Pro, M4) running LLM Ollama

▶ Ollama Mac MLX is here — 2× faster t/s for Apple Silicon

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ4

8B 420 t/s

14B 270 t/s

30B 160 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 315 t/s on Gemma 4 12B Unified (dense)-class models at Q4.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

📺 Reviews on YouTube

▶ AMD's MI350/355X Advancing AI Event Recap

▶ AMD AI Event w/ Craft Computing — MI350/355X Launch, ROCm 7

▶ AMD Visit and Tour! Featuring ROCm 7 and AMD Instinct

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$5,299

tokens / secQ4

8B 110 t/s

14B 70 t/s

30B 38 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$5,299

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single AMD Radeon RX 9070 XT 16 GB build

AMD · desktop tower

$1,300

tokens / secQ4

8B 78 t/s

14B 44 t/s

30B —

Memory16 GB · 15 usable

Bandwidth645 GB/s

Idle / Active17 W / 304 W

Sticker$1,300

Why: Best $/tg-per-second — ~$25 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗AMD ↗

📺 Reviews on YouTube

▶ AMD 9070 XT FULL SEND — Ollama / llama.cpp / vLLM, gpt-oss / qwen3 / trinity / devstral all tested

▶ RX 9070 XT is cheaper AND better… mostly

Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server

$47,000

tokens / secQ4

8B 360 t/s

14B 225 t/s

30B 135 t/s

Memory180 GB · 176 usable

Bandwidth8000 GB/s

Idle / Active100 W / 1000 W

Sticker$47,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA B200 partners ↗SHI enterprise ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ AI Lab: NVIDIA B200 vs GB200 explained | GPU architecture for LLMs

▶ Inside a NEW AI Cluster — Tour with NVIDIA B200

Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ4

8B 110 t/s

14B 70 t/s

30B 38 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 480 GB usable — most headroom for batching and longer contexts.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

▶ M3 Ultra vs RTX 5090 — The Final Battle

Efficient

MacBook Air M4 (16 GB)

Apple · laptop

$1,099

tokens / secQ4

8B 18 t/s

14B 8.0 t/s

30B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active5 W / 30 W

Sticker$1,099

Why: 30 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ The budget MacBook so stubborn it survived a 44k-token test

Every other build that runs Gemma 4 12B Unified (dense)

60 additional builds fit Gemma 4 12B Unified (dense) at Q4_K_M (8 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q4)	Active W	5-yr power
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower	$750	24 / 23 GB	347 GB/s	18 t/s	250 W	$854
RTX 3060 12 GB buildNVIDIA · desktop tower	$900	12 / 11 GB	360 GB/s	21 t/s	170 W	$598
Mac Mini M4 (24 GB)Apple · mini desktop	$999	24 / 18 GB	120 GB/s	14 t/s	50 W	$177
Single Intel Arc B580 12 GB buildIntel · desktop tower	$1.1k	12 / 11 GB	456 GB/s	26 t/s	190 W	$664
MacBook Air M5 (16 GB)Apple · laptop	$1.3k	16 / 11 GB	153 GB/s	12 t/s	30 W	$115
Single RTX 3090 (used) buildNVIDIA · desktop tower	$1.5k	24 / 23 GB	936 GB/s	58 t/s	350 W	$1.2k
Single Intel Arc Pro B70 buildIntel · desktop tower	$1.8k	32 / 31 GB	608 GB/s	47 t/s	220 W	$782
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower	$2.0k	32 / 31 GB	640 GB/s	53 t/s	300 W	$1.1k
Mac Studio M4 Max 36 GBApple · small desktop	$2.5k	36 / 28 GB	546 GB/s	64 t/s	130 W	$453
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower	$2.5k	128 / 122 GB	1024 GB/s	44 t/s	1200 W	$4.2k
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower	$2.7k	96 / 92 GB	347 GB/s	19 t/s	1000 W	$3.5k
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop	$2.8k	128 / 96 GB	256 GB/s	33 t/s	120 W	$420
Dual RTX 3090 (used) buildNVIDIA · desktop tower	$2.8k	48 / 46 GB	936 GB/s	64 t/s	700 W	$2.4k
MacBook Pro M4 Pro 48 GBApple · laptop	$2.9k	48 / 40 GB	273 GB/s	33 t/s	70 W	$246
MacBook Pro M5 Pro 48 GBApple · laptop	$3.2k	48 / 40 GB	307 GB/s	41 t/s	75 W	$263
Single RTX 4090 buildNVIDIA · desktop tower	$3.2k	24 / 23 GB	1008 GB/s	88 t/s	410 W	$1.4k
Dual Intel Arc Pro B70 buildIntel · desktop tower	$3.2k	64 / 62 GB	608 GB/s	64 t/s	380 W	$1.4k
ASUS Ascent GX10 (128 GB)ASUS · small desktop	$3.5k	128 / 119 GB	273 GB/s	61 t/s	240 W	$903
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation	$3.5k	32 / 31 GB	576 GB/s	37 t/s	260 W	$920
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation	$3.7k	64 / 62 GB	640 GB/s	64 t/s	600 W	$2.1k
MacBook Pro M4 Max 64 GBApple · laptop	$4.0k	64 / 54 GB	410 GB/s	47 t/s	90 W	$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop	$4.1k	128 / 119 GB	273 GB/s	64 t/s	240 W	$887
MacBook Pro M5 Max 64 GBApple · laptop	$4.1k	64 / 54 GB	614 GB/s	58 t/s	95 W	$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop	$4.7k	128 / 119 GB	273 GB/s	64 t/s	240 W	$887
Mac Studio M4 Max 128 GBApple · small desktop	$4.7k	128 / 112 GB	546 GB/s	64 t/s	130 W	$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop	$4.7k	128 / 119 GB	273 GB/s	64 t/s	240 W	$887
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation	$4.7k	48 / 46 GB	768 GB/s	49 t/s	300 W	$1.0k
Single RTX 5090 buildNVIDIA · desktop tower	$4.9k	32 / 31 GB	1792 GB/s	145 t/s	520 W	$1.8k
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation	$5k	48 / 46 GB	864 GB/s	47 t/s	295 W	$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop	$5k	128 / 119 GB	273 GB/s	64 t/s	160 W	$650
MacBook Pro M5 Max 128 GBApple · laptop	$5k	128 / 108 GB	614 GB/s	58 t/s	95 W	$332
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair	$6k	256 / 192 GB	256 GB/s	47 t/s	240 W	$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower	$6k	128 / 124 GB	608 GB/s	70 t/s	700 W	$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower	$6k	96 / 92 GB	936 GB/s	70 t/s	1400 W	$4.9k
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation	$7k	48 / 46 GB	1344 GB/s	—	300 W	$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation	$8k	48 / 46 GB	960 GB/s	76 t/s	300 W	$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop	$8k	256 / 232 GB	819 GB/s	82 t/s	180 W	$624
Dual AMD Radeon Pro W7900 buildAMD · workstation	$9k	96 / 92 GB	864 GB/s	64 t/s	600 W	$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect	$10k	256 / 240 GB	273 GB/s	105 t/s	460 W	$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower	$10k	64 / 62 GB	1792 GB/s	163 t/s	1050 W	$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower	$11k	256 / 248 GB	608 GB/s	76 t/s	1450 W	$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric	$12k	512 / 384 GB	256 GB/s	64 t/s	480 W	$1.7k
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation	$12k	96 / 93 GB	1792 GB/s	163 t/s	600 W	$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal	$15k	144 / 138 GB	960 GB/s	111 t/s	1500 W	$5k
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	117 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	82 t/s	960 W	$3.4k
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation	$24k	192 / 188 GB	1792 GB/s	187 t/s	1100 W	$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)	$25k	256 / 250 GB	6000 GB/s	257 t/s	1000 W	$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal	$25k	144 / 138 GB	1008 GB/s	152 t/s	2200 W	$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	107 t/s	440 W	$1.5k
Single AMD Instinct MI300X 192 GB workstationAMD · workstation	$30k	192 / 188 GB	5300 GB/s	198 t/s	750 W	$2.8k
Single H100 80 GB workstationNVIDIA · workstation	$32k	80 / 78 GB	3350 GB/s	175 t/s	700 W	$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	268 t/s	2200 W	$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server	$40k	141 / 138 GB	4800 GB/s	245 t/s	700 W	$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal	$40k	192 / 184 GB	1008 GB/s	187 t/s	3200 W	$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	152 t/s	1840 W	$7k
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal	$45k	128 / 124 GB	1792 GB/s	198 t/s	2300 W	$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)	$78k	768 / 744 GB	1792 GB/s	362 t/s	4800 W	$16k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	327 t/s	5600 W	$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops	—	128 / 119 GB	300 GB/s	—	— W	$756

Cheapest

Single Tesla P100 16 GB (used) build

NVIDIA · desktop tower

$500

tokens / secQ5

8B 28 t/s

14B 16 t/s

30B —

Memory16 GB · 15 usable

Bandwidth732 GB/s

Idle / Active25 W / 250 W

Sticker$500

Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($500 USD).

Amazon ↗eBay (used) ↗Newegg ↗

📺 Reviews on YouTube

▶ Can a 10-Year-Old $5,700 GPU Beat a New $430 GPU? | Tesla P100 Local AI Review

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ5

8B 504 t/s

14B 353 t/s

30B 227 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 412 t/s on Gemma 4 12B Unified (dense)-class models at Q5.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Tesla V100 32 GB SXM2 mod build

NVIDIA · desktop tower

$900

tokens / secQ5

8B 71 t/s

14B 42 t/s

30B 23 t/s

Memory32 GB · 31 usable

Bandwidth900 GB/s

Idle / Active33 W / 300 W

Sticker$900

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗eBay V100 SXM2 32 GB ↗eBay SXM2 → PCIe adapter ↗

📺 Reviews on YouTube

▶ Expensive RTX 5090 for LLMs? No — use V100 SXM2 + Z8 G4 instead

▶ I built my own AI — A.I.D.E.N. (32 GB V100 SXM2 + HP Z8 G4)

Best value

Single AMD Instinct MI50 32 GB (used) build

AMD · desktop tower

$900

tokens / secQ5

8B 60 t/s

14B 32 t/s

30B —

Memory32 GB · 31 usable

Bandwidth1024 GB/s

Idle / Active18 W / 300 W

Sticker$900

Why: Best $/tg-per-second — ~$24 per t/s.

Amazon ↗eBay (used) ↗Alibaba ↗

📺 Reviews on YouTube

▶ AMD MI50 32 GB speed test — Ollama vs llama.cpp (GPT-OSS & Qwen3)

▶ Is the Radeon Instinct MI50 32 GB the ultimate cheap home-AI GPU?

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ5

8B 470 t/s

14B 328 t/s

30B 210 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ5

8B 386 t/s

14B 286 t/s

30B 210 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

MacBook Air M4 (16 GB)

Apple · laptop

$1,099

tokens / secQ5

8B 15 t/s

14B 6.7 t/s

30B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active5 W / 30 W

Sticker$1,099

Why: 30 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ The budget MacBook so stubborn it survived a 44k-token test

Cheapest

Mac Mini M4 (16 GB)

Apple · mini desktop

$799

tokens / secQ5

8B 18 t/s

14B 8.4 t/s

30B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active4 W / 50 W

Sticker$799

Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($799 USD).

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ Mac Mini M4 vs M3 Pro, 6700XT, 3080Ti — LLM Ollama side by side

▶ Intel Macbook vs Apple Silicon (M1, M3 Pro, M4) running LLM Ollama

▶ Ollama Mac MLX is here — 2× faster t/s for Apple Silicon

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ5

8B 353 t/s

14B 227 t/s

30B 134 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 265 t/s on Gemma 4 12B Unified (dense)-class models at Q5.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

📺 Reviews on YouTube

▶ AMD's MI350/355X Advancing AI Event Recap

▶ AMD AI Event w/ Craft Computing — MI350/355X Launch, ROCm 7

▶ AMD Visit and Tour! Featuring ROCm 7 and AMD Instinct

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$5,299

tokens / secQ5

8B 92 t/s

14B 59 t/s

30B 32 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$5,299

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single AMD Radeon RX 9070 XT 16 GB build

AMD · desktop tower

$1,300

tokens / secQ5

8B 66 t/s

14B 37 t/s

30B —

Memory16 GB · 15 usable

Bandwidth645 GB/s

Idle / Active17 W / 304 W

Sticker$1,300

Why: Best $/tg-per-second — ~$30 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗AMD ↗

📺 Reviews on YouTube

▶ AMD 9070 XT FULL SEND — Ollama / llama.cpp / vLLM, gpt-oss / qwen3 / trinity / devstral all tested

▶ RX 9070 XT is cheaper AND better… mostly

Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server

$47,000

tokens / secQ5

8B 302 t/s

14B 189 t/s

30B 113 t/s

Memory180 GB · 176 usable

Bandwidth8000 GB/s

Idle / Active100 W / 1000 W

Sticker$47,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA B200 partners ↗SHI enterprise ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ AI Lab: NVIDIA B200 vs GB200 explained | GPU architecture for LLMs

▶ Inside a NEW AI Cluster — Tour with NVIDIA B200

Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ5

8B 92 t/s

14B 59 t/s

30B 32 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 480 GB usable — most headroom for batching and longer contexts.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

▶ M3 Ultra vs RTX 5090 — The Final Battle

Efficient

MacBook Air M4 (16 GB)

Apple · laptop

$1,099

tokens / secQ5

8B 15 t/s

14B 6.7 t/s

30B —

Memory16 GB · 11 usable

Bandwidth120 GB/s

Idle / Active5 W / 30 W

Sticker$1,099

Why: 30 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ The budget MacBook so stubborn it survived a 44k-token test

Every other build that runs Gemma 4 12B Unified (dense)

60 additional builds fit Gemma 4 12B Unified (dense) at Q5_K_M (10 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q5)	Active W	5-yr power
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower	$750	24 / 23 GB	347 GB/s	15 t/s	250 W	$854
RTX 3060 12 GB buildNVIDIA · desktop tower	$900	12 / 11 GB	360 GB/s	18 t/s	170 W	$598
Mac Mini M4 (24 GB)Apple · mini desktop	$999	24 / 18 GB	120 GB/s	12 t/s	50 W	$177
Single Intel Arc B580 12 GB buildIntel · desktop tower	$1.1k	12 / 11 GB	456 GB/s	22 t/s	190 W	$664
MacBook Air M5 (16 GB)Apple · laptop	$1.3k	16 / 11 GB	153 GB/s	9.8 t/s	30 W	$115
Single RTX 3090 (used) buildNVIDIA · desktop tower	$1.5k	24 / 23 GB	936 GB/s	49 t/s	350 W	$1.2k
Single Intel Arc Pro B70 buildIntel · desktop tower	$1.8k	32 / 31 GB	608 GB/s	39 t/s	220 W	$782
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower	$2.0k	32 / 31 GB	640 GB/s	44 t/s	300 W	$1.1k
Mac Studio M4 Max 36 GBApple · small desktop	$2.5k	36 / 28 GB	546 GB/s	54 t/s	130 W	$453
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower	$2.5k	128 / 122 GB	1024 GB/s	37 t/s	1200 W	$4.2k
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower	$2.7k	96 / 92 GB	347 GB/s	16 t/s	1000 W	$3.5k
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop	$2.8k	128 / 96 GB	256 GB/s	27 t/s	120 W	$420
Dual RTX 3090 (used) buildNVIDIA · desktop tower	$2.8k	48 / 46 GB	936 GB/s	54 t/s	700 W	$2.4k
MacBook Pro M4 Pro 48 GBApple · laptop	$2.9k	48 / 40 GB	273 GB/s	27 t/s	70 W	$246
MacBook Pro M5 Pro 48 GBApple · laptop	$3.2k	48 / 40 GB	307 GB/s	34 t/s	75 W	$263
Single RTX 4090 buildNVIDIA · desktop tower	$3.2k	24 / 23 GB	1008 GB/s	74 t/s	410 W	$1.4k
Dual Intel Arc Pro B70 buildIntel · desktop tower	$3.2k	64 / 62 GB	608 GB/s	54 t/s	380 W	$1.4k
ASUS Ascent GX10 (128 GB)ASUS · small desktop	$3.5k	128 / 119 GB	273 GB/s	51 t/s	240 W	$903
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation	$3.5k	32 / 31 GB	576 GB/s	31 t/s	260 W	$920
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation	$3.7k	64 / 62 GB	640 GB/s	54 t/s	600 W	$2.1k
MacBook Pro M4 Max 64 GBApple · laptop	$4.0k	64 / 54 GB	410 GB/s	39 t/s	90 W	$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop	$4.1k	128 / 119 GB	273 GB/s	54 t/s	240 W	$887
MacBook Pro M5 Max 64 GBApple · laptop	$4.1k	64 / 54 GB	614 GB/s	49 t/s	95 W	$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop	$4.7k	128 / 119 GB	273 GB/s	54 t/s	240 W	$887
Mac Studio M4 Max 128 GBApple · small desktop	$4.7k	128 / 112 GB	546 GB/s	54 t/s	130 W	$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop	$4.7k	128 / 119 GB	273 GB/s	54 t/s	240 W	$887
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation	$4.7k	48 / 46 GB	768 GB/s	41 t/s	300 W	$1.0k
Single RTX 5090 buildNVIDIA · desktop tower	$4.9k	32 / 31 GB	1792 GB/s	122 t/s	520 W	$1.8k
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation	$5k	48 / 46 GB	864 GB/s	39 t/s	295 W	$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop	$5k	128 / 119 GB	273 GB/s	54 t/s	160 W	$650
MacBook Pro M5 Max 128 GBApple · laptop	$5k	128 / 108 GB	614 GB/s	49 t/s	95 W	$332
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair	$6k	256 / 192 GB	256 GB/s	39 t/s	240 W	$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower	$6k	128 / 124 GB	608 GB/s	59 t/s	700 W	$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower	$6k	96 / 92 GB	936 GB/s	59 t/s	1400 W	$4.9k
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation	$7k	48 / 46 GB	1344 GB/s	—	300 W	$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation	$8k	48 / 46 GB	960 GB/s	64 t/s	300 W	$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop	$8k	256 / 232 GB	819 GB/s	69 t/s	180 W	$624
Dual AMD Radeon Pro W7900 buildAMD · workstation	$9k	96 / 92 GB	864 GB/s	54 t/s	600 W	$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect	$10k	256 / 240 GB	273 GB/s	88 t/s	460 W	$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower	$10k	64 / 62 GB	1792 GB/s	137 t/s	1050 W	$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower	$11k	256 / 248 GB	608 GB/s	64 t/s	1450 W	$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric	$12k	512 / 384 GB	256 GB/s	54 t/s	480 W	$1.7k
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation	$12k	96 / 93 GB	1792 GB/s	137 t/s	600 W	$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal	$15k	144 / 138 GB	960 GB/s	93 t/s	1500 W	$5k
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	98 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	69 t/s	960 W	$3.4k
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation	$24k	192 / 188 GB	1792 GB/s	157 t/s	1100 W	$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)	$25k	256 / 250 GB	6000 GB/s	216 t/s	1000 W	$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal	$25k	144 / 138 GB	1008 GB/s	127 t/s	2200 W	$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	90 t/s	440 W	$1.5k
Single AMD Instinct MI300X 192 GB workstationAMD · workstation	$30k	192 / 188 GB	5300 GB/s	167 t/s	750 W	$2.8k
Single H100 80 GB workstationNVIDIA · workstation	$32k	80 / 78 GB	3350 GB/s	147 t/s	700 W	$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	225 t/s	2200 W	$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server	$40k	141 / 138 GB	4800 GB/s	206 t/s	700 W	$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal	$40k	192 / 184 GB	1008 GB/s	157 t/s	3200 W	$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	127 t/s	1840 W	$7k
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal	$45k	128 / 124 GB	1792 GB/s	167 t/s	2300 W	$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)	$78k	768 / 744 GB	1792 GB/s	304 t/s	4800 W	$16k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	274 t/s	5600 W	$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops	—	128 / 119 GB	300 GB/s	—	— W	$756

Cheapest

Single Tesla P100 16 GB (used) build

NVIDIA · desktop tower

$500

tokens / secQ8

8B 17 t/s

14B 9.9 t/s

30B —

Memory16 GB · 15 usable

Bandwidth732 GB/s

Idle / Active25 W / 250 W

Sticker$500

Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($500 USD).

Amazon ↗eBay (used) ↗Newegg ↗

📺 Reviews on YouTube

▶ Can a 10-Year-Old $5,700 GPU Beat a New $430 GPU? | Tesla P100 Local AI Review

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ8

8B 312 t/s

14B 218 t/s

30B 140 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 255 t/s on Gemma 4 12B Unified (dense)-class models at Q8.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Tesla V100 32 GB SXM2 mod build

NVIDIA · desktop tower

$900

tokens / secQ8

8B 44 t/s

14B 26 t/s

30B 14 t/s

Memory32 GB · 31 usable

Bandwidth900 GB/s

Idle / Active33 W / 300 W

Sticker$900

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗eBay V100 SXM2 32 GB ↗eBay SXM2 → PCIe adapter ↗

📺 Reviews on YouTube

▶ Expensive RTX 5090 for LLMs? No — use V100 SXM2 + Z8 G4 instead

▶ I built my own AI — A.I.D.E.N. (32 GB V100 SXM2 + HP Z8 G4)

Best value

Single AMD Instinct MI50 32 GB (used) build

AMD · desktop tower

$900

tokens / secQ8

8B 37 t/s

14B 20 t/s

30B —

Memory32 GB · 31 usable

Bandwidth1024 GB/s

Idle / Active18 W / 300 W

Sticker$900

Why: Best $/tg-per-second — ~$39 per t/s.

Amazon ↗eBay (used) ↗Alibaba ↗

📺 Reviews on YouTube

▶ AMD MI50 32 GB speed test — Ollama vs llama.cpp (GPT-OSS & Qwen3)

▶ Is the Radeon Instinct MI50 32 GB the ultimate cheap home-AI GPU?

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ8

8B 291 t/s

14B 203 t/s

30B 130 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ8

8B 239 t/s

14B 177 t/s

30B 130 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

Mac Mini M4 (24 GB)

Apple · mini desktop

$999

tokens / secQ8

8B 11 t/s

14B 6.2 t/s

30B —

Memory24 GB · 18 usable

Bandwidth120 GB/s

Idle / Active4 W / 50 W

Sticker$999

Why: 50 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ New Mac Mini M4 running SD1.5, FLUX, and Ollama (Qwen)

Cheapest

Mac Mini M4 (24 GB)

Apple · mini desktop

$999

tokens / secQ8

8B 11 t/s

14B 6.2 t/s

30B —

Memory24 GB · 18 usable

Bandwidth120 GB/s

Idle / Active4 W / 50 W

Sticker$999

Why: Lowest sticker that still fits Gemma 4 12B Unified (dense) ($999 USD).

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ New Mac Mini M4 running SD1.5, FLUX, and Ollama (Qwen)

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ8

8B 218 t/s

14B 140 t/s

30B 83 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 164 t/s on Gemma 4 12B Unified (dense)-class models at Q8.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

📺 Reviews on YouTube

▶ AMD's MI350/355X Advancing AI Event Recap

▶ AMD AI Event w/ Craft Computing — MI350/355X Launch, ROCm 7

▶ AMD Visit and Tour! Featuring ROCm 7 and AMD Instinct

All-rounder

Mac Studio M3 Ultra 96 GB

Apple · small desktop

$5,299

tokens / secQ8

8B 57 t/s

14B 36 t/s

30B 20 t/s

Memory96 GB · 80 usable

Bandwidth819 GB/s

Idle / Active10 W / 180 W

Sticker$5,299

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra vs M4 Max — Don't Buy the WRONG Mac Studio

▶ Skip M3 Ultra & RTX 5090 for LLMs | NEW 96GB KING

▶ Ditch 512 GB Monster — this M3 Ultra Just Redefined 'Enough'

Best value

Single AMD Radeon RX 9070 XT 16 GB build

AMD · desktop tower

$1,300

tokens / secQ8

8B 41 t/s

14B 23 t/s

30B —

Memory16 GB · 15 usable

Bandwidth645 GB/s

Idle / Active17 W / 304 W

Sticker$1,300

Why: Best $/tg-per-second — ~$49 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗AMD ↗

📺 Reviews on YouTube

▶ AMD 9070 XT FULL SEND — Ollama / llama.cpp / vLLM, gpt-oss / qwen3 / trinity / devstral all tested

▶ RX 9070 XT is cheaper AND better… mostly

Best CUDA

Single B200 180 GB workstation

NVIDIA · workstation / 4U server

$47,000

tokens / secQ8

8B 187 t/s

14B 117 t/s

30B 70 t/s

Memory180 GB · 176 usable

Bandwidth8000 GB/s

Idle / Active100 W / 1000 W

Sticker$47,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA B200 partners ↗SHI enterprise ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ AI Lab: NVIDIA B200 vs GB200 explained | GPU architecture for LLMs

▶ Inside a NEW AI Cluster — Tour with NVIDIA B200

Most VRAM

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ8

8B 57 t/s

14B 36 t/s

30B 20 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 480 GB usable — most headroom for batching and longer contexts.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

▶ M3 Ultra vs RTX 5090 — The Final Battle

Efficient

MacBook Pro M4 Pro 48 GB

Apple · laptop

$2,899

tokens / secQ8

8B 26 t/s

14B 15 t/s

30B 7.3 t/s

Memory48 GB · 40 usable

Bandwidth273 GB/s

Idle / Active5 W / 70 W

Sticker$2,899

Why: 70 W active — lowest power draw of the fitting builds.

Amazon ↗Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ FREE Local LLMs on Apple Silicon — Fast! (AZisk)

Every other build that runs Gemma 4 12B Unified (dense)

55 additional builds fit Gemma 4 12B Unified (dense) at Q8_0 (15 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q8)	Active W	5-yr power
Single Tesla P40 24 GB (used) buildNVIDIA · desktop tower	$750	24 / 23 GB	347 GB/s	9.1 t/s	250 W	$854
Single RTX 3090 (used) buildNVIDIA · desktop tower	$1.5k	24 / 23 GB	936 GB/s	30 t/s	350 W	$1.2k
Single Intel Arc Pro B70 buildIntel · desktop tower	$1.8k	32 / 31 GB	608 GB/s	24 t/s	220 W	$782
Single AMD Radeon AI Pro R9700 32 GB buildAMD · desktop tower	$2.0k	32 / 31 GB	640 GB/s	27 t/s	300 W	$1.1k
Mac Studio M4 Max 36 GBApple · small desktop	$2.5k	36 / 28 GB	546 GB/s	33 t/s	130 W	$453
Quad AMD MI50 32 GB (128 GB) homelab buildAMD · rack/large tower	$2.5k	128 / 122 GB	1024 GB/s	23 t/s	1200 W	$4.2k
Quad Tesla P40 (96 GB) homelab buildNVIDIA · rack/large tower	$2.7k	96 / 92 GB	347 GB/s	9.7 t/s	1000 W	$3.5k
AMD Ryzen AI Max+ 395 (128 GB)AMD · mini desktop / laptop	$2.8k	128 / 96 GB	256 GB/s	17 t/s	120 W	$420
Dual RTX 3090 (used) buildNVIDIA · desktop tower	$2.8k	48 / 46 GB	936 GB/s	33 t/s	700 W	$2.4k
MacBook Pro M5 Pro 48 GBApple · laptop	$3.2k	48 / 40 GB	307 GB/s	21 t/s	75 W	$263
Single RTX 4090 buildNVIDIA · desktop tower	$3.2k	24 / 23 GB	1008 GB/s	46 t/s	410 W	$1.4k
Dual Intel Arc Pro B70 buildIntel · desktop tower	$3.2k	64 / 62 GB	608 GB/s	33 t/s	380 W	$1.4k
ASUS Ascent GX10 (128 GB)ASUS · small desktop	$3.5k	128 / 119 GB	273 GB/s	32 t/s	240 W	$903
Single AMD Radeon Pro W7800 32 GB buildAMD · workstation	$3.5k	32 / 31 GB	576 GB/s	19 t/s	260 W	$920
Dual AMD Radeon AI Pro R9700 build (64 GB)AMD · workstation	$3.7k	64 / 62 GB	640 GB/s	33 t/s	600 W	$2.1k
MacBook Pro M4 Max 64 GBApple · laptop	$4.0k	64 / 54 GB	410 GB/s	24 t/s	90 W	$315
Dell Pro Max with GB10 (128 GB)Dell · small desktop	$4.1k	128 / 119 GB	273 GB/s	33 t/s	240 W	$887
MacBook Pro M5 Max 64 GBApple · laptop	$4.1k	64 / 54 GB	614 GB/s	30 t/s	95 W	$332
MSI EdgeXpert MS-C931 (128 GB)MSI · small desktop	$4.7k	128 / 119 GB	273 GB/s	33 t/s	240 W	$887
Mac Studio M4 Max 128 GBApple · small desktop	$4.7k	128 / 112 GB	546 GB/s	33 t/s	130 W	$453
NVIDIA DGX Spark (128 GB)NVIDIA · small desktop	$4.7k	128 / 119 GB	273 GB/s	33 t/s	240 W	$887
Single RTX A6000 48 GB (Ampere) buildNVIDIA · workstation	$4.7k	48 / 46 GB	768 GB/s	25 t/s	300 W	$1.0k
Single RTX 5090 buildNVIDIA · desktop tower	$4.9k	32 / 31 GB	1792 GB/s	75 t/s	520 W	$1.8k
Single AMD Radeon Pro W7900 48 GB buildAMD · workstation	$5k	48 / 46 GB	864 GB/s	24 t/s	295 W	$1.0k
Lenovo ThinkStation PGX (128 GB)Lenovo · small desktop	$5k	128 / 119 GB	273 GB/s	33 t/s	160 W	$650
MacBook Pro M5 Max 128 GBApple · laptop	$5k	128 / 108 GB	614 GB/s	30 t/s	95 W	$332
2× Strix Halo cluster (256 GB unified)AMD · mini-PC pair	$6k	256 / 192 GB	256 GB/s	24 t/s	240 W	$841
Quad Intel Arc Pro B70 buildIntel · rack/large tower	$6k	128 / 124 GB	608 GB/s	36 t/s	700 W	$2.5k
Quad RTX 3090 (used) buildNVIDIA · rack/large tower	$6k	96 / 92 GB	936 GB/s	36 t/s	1400 W	$4.9k
Single RTX Pro 5000 Blackwell 48 GB buildNVIDIA · workstation	$7k	48 / 46 GB	1344 GB/s	—	300 W	$1.1k
Single RTX 6000 Ada 48 GB buildNVIDIA · workstation	$8k	48 / 46 GB	960 GB/s	39 t/s	300 W	$1.0k
Mac Studio M3 Ultra 256 GBApple · small desktop	$8k	256 / 232 GB	819 GB/s	42 t/s	180 W	$624
Dual AMD Radeon Pro W7900 buildAMD · workstation	$9k	96 / 92 GB	864 GB/s	33 t/s	600 W	$2.1k
2× DGX Spark cluster (256 GB unified, CUDA)NVIDIA · two desktops, 200 G interconnect	$10k	256 / 240 GB	273 GB/s	55 t/s	460 W	$1.7k
Dual RTX 5090 buildNVIDIA · rack/large tower	$10k	64 / 62 GB	1792 GB/s	85 t/s	1050 W	$3.6k
Octuple Intel Arc Pro B70 clusterIntel · rack/large tower	$11k	256 / 248 GB	608 GB/s	39 t/s	1450 W	$5k
4× Strix Halo cluster (512 GB unified)AMD · rack of 4 mini-PCs, 10 GbE fabric	$12k	512 / 384 GB	256 GB/s	33 t/s	480 W	$1.7k
Single RTX Pro 6000 Blackwell 96 GB buildNVIDIA · workstation	$12k	96 / 93 GB	1792 GB/s	85 t/s	600 W	$2.1k
Tinybox Red (6× 7900 XTX, 144 GB)tinycorp · 12U pedestal	$15k	144 / 138 GB	960 GB/s	58 t/s	1500 W	$5k
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	61 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	42 t/s	960 W	$3.4k
Dual RTX Pro 6000 Blackwell buildNVIDIA · workstation	$24k	192 / 188 GB	1792 GB/s	97 t/s	1100 W	$3.8k
Single AMD Instinct MI325X 256 GB workstationAMD · workstation / 4U server (OAM)	$25k	256 / 250 GB	6000 GB/s	133 t/s	1000 W	$3.6k
Tinybox Green (6× RTX 4090, 144 GB)tinycorp · 12U pedestal	$25k	144 / 138 GB	1008 GB/s	79 t/s	2200 W	$8k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	56 t/s	440 W	$1.5k
Single AMD Instinct MI300X 192 GB workstationAMD · workstation	$30k	192 / 188 GB	5300 GB/s	103 t/s	750 W	$2.8k
Single H100 80 GB workstationNVIDIA · workstation	$32k	80 / 78 GB	3350 GB/s	91 t/s	700 W	$2.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	140 t/s	2200 W	$8k
Single H200 141 GB workstationNVIDIA · workstation / 2U server	$40k	141 / 138 GB	4800 GB/s	127 t/s	700 W	$2.5k
Tinybox Pro (8× RTX 4090, 192 GB)tinycorp · 12U pedestal	$40k	192 / 184 GB	1008 GB/s	97 t/s	3200 W	$11k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	79 t/s	1840 W	$7k
Tinybox Green v2 (4× RTX 5090, 128 GB)tinycorp · 12U pedestal	$45k	128 / 124 GB	1792 GB/s	103 t/s	2300 W	$8k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)	$78k	768 / 744 GB	1792 GB/s	188 t/s	4800 W	$16k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	170 t/s	5600 W	$20k
NVIDIA RTX Spark (128 GB)NVIDIA · OEM laptops + small desktops	—	128 / 119 GB	300 GB/s	—	— W	$756

Open in the live picker (Q2 / Q5 / Q8 toggles) → Compare Gemma 4 12B Unified (dense) against other LLMs → Pick LLMs for your hardware → Submit a benchmark for Gemma 4 12B Unified (dense) ↗

Sources

Last updated 2026-06-27