Hardware to run GLM-5.2 753B (MoE)

Z.ai's June 2026 flagship MoE successor to GLM-5.1: 753B total / 39B active, native 1M-token context via IndexShare sparse attention, two thinking-effort levels. MIT-licensed, no benchmarks at launch.

GLM · text

GLM-5.2 753B (MoE)

753 B params 455 GB Q4 file 470 GB min Q4 559 GB min Q5 902 GB min Q8 1000K ctx MIT 🤗

switch in the live picker →

Quantization

Cheapest

4× Strix Halo cluster (512 GB unified)

AMD · rack of 4 mini-PCs, 10 GbE fabric

$11,500

tokens / secQ2

235B-MoE 29 t/s

671B-MoE 6.0 t/s

1T-MoE —

Memory512 GB · 384 usable

Bandwidth256 GB/s

Idle / Active32 W / 480 W

Sticker$11,500

Why: Lowest sticker that still fits GLM-5.2 753B (MoE) ($12k USD).

Amazon (Strix Halo Mini PC) ↗GMKtec EVO-X2 ↗Bosgame M5 ↗Framework ↗

📺 Reviews on YouTube

▶ Kimi-K2 (1T) / GLM 4.7 (355B) on a 4-node Strix Halo cluster — 512 GB unified memory

▶ AMD Strix Halo / Ryzen AI Max+ 395 — honest review (single-node baseline)

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ2

235B-MoE 198 t/s

671B-MoE 126 t/s

1T-MoE 108 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 120 t/s on GLM-5.2 753B (MoE)-class models at Q2.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ2

235B-MoE 156 t/s

671B-MoE 60 t/s

1T-MoE 36 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

📺 Reviews on YouTube

▶ AMD's MI350/355X Advancing AI Event Recap

▶ AMD AI Event w/ Craft Computing — MI350/355X Launch, ROCm 7

▶ AMD Visit and Tour! Featuring ROCm 7 and AMD Instinct

Best value

Quad RTX Pro 6000 Blackwell build (384 GB)

NVIDIA · workstation / 4U pedestal

$38,000

tokens / secQ2

235B-MoE 174 t/s

671B-MoE 48 t/s

1T-MoE —

Memory384 GB · 372 usable

Bandwidth1792 GB/s

Idle / Active100 W / 2200 W

Sticker$38,000

Why: Best $/tg-per-second — ~$834 per t/s.

Amazon ↗Newegg ↗B&H Photo ↗PNY Pro ↗

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ2

235B-MoE 186 t/s

671B-MoE 120 t/s

1T-MoE 102 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ2

235B-MoE 312 t/s

671B-MoE 114 t/s

1T-MoE 66 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ2

235B-MoE 26 t/s

671B-MoE 9.6 t/s

1T-MoE 7.2 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 220 W active — lowest power draw of the fitting builds.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

▶ M3 Ultra vs RTX 5090 — The Final Battle

Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ2

235B-MoE 26 t/s

671B-MoE 9.6 t/s

1T-MoE 7.2 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: Lowest sticker that still fits GLM-5.2 753B (MoE) ($14k USD).

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

▶ M3 Ultra vs RTX 5090 — The Final Battle

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ2

235B-MoE 156 t/s

671B-MoE 60 t/s

1T-MoE 36 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 57 t/s on GLM-5.2 753B (MoE)-class models at Q2.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

📺 Reviews on YouTube

▶ AMD's MI350/355X Advancing AI Event Recap

▶ AMD AI Event w/ Craft Computing — MI350/355X Launch, ROCm 7

▶ AMD Visit and Tour! Featuring ROCm 7 and AMD Instinct

Every other build that runs GLM-5.2 753B (MoE)

6 additional builds fit GLM-5.2 753B (MoE) at Q2_K (259 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q2)	Active W	5-yr power
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	13 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	9.1 t/s	960 W	$3.4k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	10 t/s	440 W	$1.5k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	18 t/s	1840 W	$7k
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)	$78k	768 / 744 GB	1792 GB/s	85 t/s	4800 W	$16k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	80 t/s	5600 W	$20k

Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ4

235B-MoE 22 t/s

671B-MoE 8.0 t/s

1T-MoE 6.0 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: Lowest sticker that still fits GLM-5.2 753B (MoE) ($14k USD).

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

▶ M3 Ultra vs RTX 5090 — The Final Battle

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ4

235B-MoE 165 t/s

671B-MoE 105 t/s

1T-MoE 90 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 100 t/s on GLM-5.2 753B (MoE)-class models at Q4.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

8× RTX Pro 6000 Blackwell server (768 GB)

NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)

$78,000

tokens / secQ4

235B-MoE 220 t/s

671B-MoE 75 t/s

1T-MoE 40 t/s

Memory768 GB · 744 usable

Bandwidth1792 GB/s

Idle / Active220 W / 4800 W

Sticker$78,000

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗PNY Pro ↗CoreWeave ↗SuperMicro ↗

Best value

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ4

235B-MoE 260 t/s

671B-MoE 95 t/s

1T-MoE 55 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: Best $/tg-per-second — ~$1,309 per t/s.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ4

235B-MoE 155 t/s

671B-MoE 100 t/s

1T-MoE 85 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

8× DGX Spark cluster (1024 GB unified, CUDA)

NVIDIA · rack of 8 desktops, 200 GbE fabric

$43,500

tokens / secQ4

235B-MoE 42 t/s

671B-MoE 16 t/s

1T-MoE 13 t/s

Memory1024 GB · 976 usable

Bandwidth273 GB/s

Idle / Active220 W / 1840 W

Sticker$43,500

Why: 976 GB usable — most headroom for batching and longer contexts.

Amazon (8× DGX Spark) ↗Micro Center ↗NVIDIA Marketplace ↗ServeTheHome guide ↗

📺 Reviews on YouTube

▶ NVIDIA didn't want me to do this — 8× DGX Spark 1 TB VRAM cluster (AZisk)

▶ I built an 8x NVIDIA GB10 cluster for massive Local AI

Efficient

2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)

Apple · two desktops, Thunderbolt 5 RDMA

$28,400

tokens / secQ4

235B-MoE 29 t/s

671B-MoE 9.0 t/s

1T-MoE 7.0 t/s

Memory1024 GB · 960 usable

Bandwidth819 GB/s

Idle / Active24 W / 440 W

Sticker$28,400

Why: 440 W active — lowest power draw of the fitting builds.

eBay ↗Apple Refurbished ↗B&H Photo (used) ↗

📺 Reviews on YouTube

▶ Apple didn't have to go this hard… (4× M3 Ultra TB5 cluster)

▶ Mac Studio CLUSTER vs M3 Ultra

Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ4

235B-MoE 22 t/s

671B-MoE 8.0 t/s

1T-MoE 6.0 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: Lowest sticker that still fits GLM-5.2 753B (MoE) ($14k USD).

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

▶ M3 Ultra vs RTX 5090 — The Final Battle

Every other build that runs GLM-5.2 753B (MoE)

3 additional builds fit GLM-5.2 753B (MoE) at Q4_K_M (470 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q4)	Active W	5-yr power
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	10 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	7.6 t/s	960 W	$3.4k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	66 t/s	5600 W	$20k

Cheapest

8× Strix Halo cluster (1024 GB unified)

AMD · rack of 8 mini-PCs, 10/25 GbE fabric

$23,200

tokens / secQ5

235B-MoE 27 t/s

671B-MoE 6.7 t/s

1T-MoE 4.2 t/s

Memory1024 GB · 768 usable

Bandwidth256 GB/s

Idle / Active64 W / 960 W

Sticker$23,200

Why: Lowest sticker that still fits GLM-5.2 753B (MoE) ($23k USD).

Amazon (Strix Halo Mini PC) ↗GMKtec EVO-X2 ↗Bosgame M5 ↗Framework ↗

📺 Reviews on YouTube

▶ Kimi-K2 (1T) / GLM 4.7 (355B) on a 4-node Strix Halo cluster — closest documented baseline

▶ AMD Strix Halo / Ryzen AI Max+ 395 — honest review (single-node baseline)

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ5

235B-MoE 139 t/s

671B-MoE 88 t/s

1T-MoE 76 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 84 t/s on GLM-5.2 753B (MoE)-class models at Q5.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

8× RTX Pro 6000 Blackwell server (768 GB)

NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)

$78,000

tokens / secQ5

235B-MoE 185 t/s

671B-MoE 63 t/s

1T-MoE 34 t/s

Memory768 GB · 744 usable

Bandwidth1792 GB/s

Idle / Active220 W / 4800 W

Sticker$78,000

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Amazon ↗PNY Pro ↗CoreWeave ↗SuperMicro ↗

Best value

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ5

235B-MoE 218 t/s

671B-MoE 80 t/s

1T-MoE 46 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: Best $/tg-per-second — ~$1,559 per t/s.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ5

235B-MoE 130 t/s

671B-MoE 84 t/s

1T-MoE 71 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

8× DGX Spark cluster (1024 GB unified, CUDA)

NVIDIA · rack of 8 desktops, 200 GbE fabric

$43,500

tokens / secQ5

235B-MoE 35 t/s

671B-MoE 13 t/s

1T-MoE 11 t/s

Memory1024 GB · 976 usable

Bandwidth273 GB/s

Idle / Active220 W / 1840 W

Sticker$43,500

Why: 976 GB usable — most headroom for batching and longer contexts.

Amazon (8× DGX Spark) ↗Micro Center ↗NVIDIA Marketplace ↗ServeTheHome guide ↗

📺 Reviews on YouTube

▶ NVIDIA didn't want me to do this — 8× DGX Spark 1 TB VRAM cluster (AZisk)

▶ I built an 8x NVIDIA GB10 cluster for massive Local AI

Efficient

2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)

Apple · two desktops, Thunderbolt 5 RDMA

$28,400

tokens / secQ5

235B-MoE 24 t/s

671B-MoE 7.6 t/s

1T-MoE 5.9 t/s

Memory1024 GB · 960 usable

Bandwidth819 GB/s

Idle / Active24 W / 440 W

Sticker$28,400

Why: 440 W active — lowest power draw of the fitting builds.

eBay ↗Apple Refurbished ↗B&H Photo (used) ↗

📺 Reviews on YouTube

▶ Apple didn't have to go this hard… (4× M3 Ultra TB5 cluster)

▶ Mac Studio CLUSTER vs M3 Ultra

No plug-and-play build fits at Q5_K_M

Only used / DIY / homelab-cluster rigs fit GLM-5.2 753B (MoE) at this quant. Turn off "Only plug & play" to see them.

Every other build that runs GLM-5.2 753B (MoE)

1 additional build fit GLM-5.2 753B (MoE) at Q5_K_M (559 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q5)	Active W	5-yr power
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	56 t/s	5600 W	$20k

Open in the live picker (Q2 / Q5 / Q8 toggles) → Compare GLM-5.2 753B (MoE) against other LLMs → Pick LLMs for your hardware → Submit a benchmark for GLM-5.2 753B (MoE) ↗

Sources

Last updated 2026-06-27

Hardware to run GLM-5.2 753B (MoE)

Every other build that runs GLM-5.2 753B (MoE)

Every other build that runs GLM-5.2 753B (MoE)

No plug-and-play build fits at Q5_K_M

Every other build that runs GLM-5.2 753B (MoE)

No plug-and-play build fits at Q8_0

Sources