All models

Hardware to run GLM-5.2 753B (MoE)

Z.ai's June 2026 flagship MoE successor to GLM-5.1: 753B total / 39B active, native 1M-token context via IndexShare sparse attention, two thinking-effort levels. MIT-licensed, no benchmarks at launch.

GLM · text
GLM-5.2 753B (MoE)
753 B params 455 GB Q4 file 470 GB min Q4 559 GB min Q5 902 GB min Q8 1000K ctx MIT 🤗
switch in the live picker →
Quantization
Availability
Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop
$14,199
tokens / secQ4
235B-MoE 22 t/s
671B-MoE 8.0 t/s
1T-MoE 6.0 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: Lowest sticker that still fits GLM-5.2 753B (MoE) ($14k USD).
Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server
$475,000
tokens / secQ4
235B-MoE 165 t/s
671B-MoE 105 t/s
1T-MoE 90 t/s
Memory1440 GB · 1404 usable
Bandwidth8000 GB/s
Idle / Active900 W / 10200 W
Sticker$475,000
Why: Highest measured tg/s — 100 t/s on GLM-5.2 753B (MoE)-class models at Q4.
All-rounder

8× RTX Pro 6000 Blackwell server (768 GB)

NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)
$78,000
tokens / secQ4
235B-MoE 220 t/s
671B-MoE 75 t/s
1T-MoE 40 t/s
Memory768 GB · 744 usable
Bandwidth1792 GB/s
Idle / Active220 W / 4800 W
Sticker$78,000
Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.
Best value

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)
$118,000
tokens / secQ4
235B-MoE 260 t/s
671B-MoE 95 t/s
1T-MoE 55 t/s
Memory1152 GB · 1116 usable
Bandwidth1792 GB/s
Idle / Active340 W / 7400 W
Sticker$118,000
Why: Best $/tg-per-second — ~$1,309 per t/s.
Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack
$380,000
tokens / secQ4
235B-MoE 155 t/s
671B-MoE 100 t/s
1T-MoE 85 t/s
Memory1128 GB · 1100 usable
Bandwidth4800 GB/s
Idle / Active700 W / 6500 W
Sticker$380,000
Why: Strongest CUDA-only software stack among fitting builds.
Most VRAM

8× DGX Spark cluster (1024 GB unified, CUDA)

NVIDIA · rack of 8 desktops, 200 GbE fabric
$43,500
tokens / secQ4
235B-MoE 42 t/s
671B-MoE 16 t/s
1T-MoE 13 t/s
Memory1024 GB · 976 usable
Bandwidth273 GB/s
Idle / Active220 W / 1840 W
Sticker$43,500
Why: 976 GB usable — most headroom for batching and longer contexts.
Efficient

2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)

Apple · two desktops, Thunderbolt 5 RDMA
$28,400
tokens / secQ4
235B-MoE 29 t/s
671B-MoE 9.0 t/s
1T-MoE 7.0 t/s
Memory1024 GB · 960 usable
Bandwidth819 GB/s
Idle / Active24 W / 440 W
Sticker$28,400
Why: 440 W active — lowest power draw of the fitting builds.
Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop
$14,199
tokens / secQ4
235B-MoE 22 t/s
671B-MoE 8.0 t/s
1T-MoE 6.0 t/s
Memory512 GB · 480 usable
Bandwidth819 GB/s
Idle / Active12 W / 220 W
Sticker$14,199
Why: Lowest sticker that still fits GLM-5.2 753B (MoE) ($14k USD).

Every other build that runs GLM-5.2 753B (MoE)

3 additional builds fit GLM-5.2 753B (MoE) at Q4_K_M (470 GB usable minimum), sorted by sticker price.

BuildPriceMemoryBandwidthtg/s (Q4)Active W5-yr power
$20k512 / 488 GB273 GB/s10 t/s920 W$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric
$23k1024 / 768 GB256 GB/s7.6 t/s960 W$3.4k
8× H100 80 GB serverNVIDIA · server rack
$280k640 / 620 GB3350 GB/s66 t/s5600 W$20k
Open in the live picker (Q2 / Q5 / Q8 toggles) → Compare GLM-5.2 753B (MoE) against other LLMs → Pick LLMs for your hardware → Submit a benchmark for GLM-5.2 753B (MoE) ↗

Sources

Last updated 2026-06-27