Hardware to run GLM-5.2 753B (MoE)
Z.ai's June 2026 flagship MoE successor to GLM-5.1: 753B total / 39B active, native 1M-token context via IndexShare sparse attention, two thinking-effort levels. MIT-licensed, no benchmarks at launch.
4× Strix Halo cluster (512 GB unified)
DGX B200 — 8× B200 server (1.44 TB HBM3e)
Single AMD Instinct MI355X 288 GB workstation
Quad RTX Pro 6000 Blackwell build (384 GB)
DGX H200 — 8× H200 server (1.13 TB HBM3e)
12× RTX Pro 6000 Blackwell rack (1152 GB)
Mac Studio M3 Ultra 512 GB
Mac Studio M3 Ultra 512 GB
Single AMD Instinct MI355X 288 GB workstation
Every other build that runs GLM-5.2 753B (MoE)
6 additional builds fit GLM-5.2 753B (MoE) at Q2_K (259 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q2) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops | $20k | 512 / 488 GB | 273 GB/s | 13 t/s | 920 W | $3.4k |
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric | $23k | 1024 / 768 GB | 256 GB/s | 9.1 t/s | 960 W | $3.4k |
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA | $28k | 1024 / 960 GB | 819 GB/s | 10 t/s | 440 W | $1.5k |
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric | $44k | 1024 / 976 GB | 273 GB/s | 18 t/s | 1840 W | $7k |
8× RTX Pro 6000 Blackwell server (768 GB)NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT) | $78k | 768 / 744 GB | 1792 GB/s | 85 t/s | 4800 W | $16k |
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 80 t/s | 5600 W | $20k |
Mac Studio M3 Ultra 512 GB
DGX B200 — 8× B200 server (1.44 TB HBM3e)
8× RTX Pro 6000 Blackwell server (768 GB)
12× RTX Pro 6000 Blackwell rack (1152 GB)
DGX H200 — 8× H200 server (1.13 TB HBM3e)
8× DGX Spark cluster (1024 GB unified, CUDA)
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)
Mac Studio M3 Ultra 512 GB
Every other build that runs GLM-5.2 753B (MoE)
3 additional builds fit GLM-5.2 753B (MoE) at Q4_K_M (470 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q4) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops | $20k | 512 / 488 GB | 273 GB/s | 10 t/s | 920 W | $3.4k |
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric | $23k | 1024 / 768 GB | 256 GB/s | 7.6 t/s | 960 W | $3.4k |
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 66 t/s | 5600 W | $20k |
8× Strix Halo cluster (1024 GB unified)
DGX B200 — 8× B200 server (1.44 TB HBM3e)
8× RTX Pro 6000 Blackwell server (768 GB)
12× RTX Pro 6000 Blackwell rack (1152 GB)
DGX H200 — 8× H200 server (1.13 TB HBM3e)
8× DGX Spark cluster (1024 GB unified, CUDA)
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)
No plug-and-play build fits at Q5_K_M
Only used / DIY / homelab-cluster rigs fit GLM-5.2 753B (MoE) at this quant. Turn off "Only plug & play" to see them.
Every other build that runs GLM-5.2 753B (MoE)
1 additional build fit GLM-5.2 753B (MoE) at Q5_K_M (559 GB usable minimum), sorted by sticker price.
| Build | Price | Memory | Bandwidth | tg/s (Q5) | Active W | 5-yr power |
|---|---|---|---|---|---|---|
8× H100 80 GB serverNVIDIA · server rack | $280k | 640 / 620 GB | 3350 GB/s | 56 t/s | 5600 W | $20k |
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)
DGX B200 — 8× B200 server (1.44 TB HBM3e)
12× RTX Pro 6000 Blackwell rack (1152 GB)
8× DGX Spark cluster (1024 GB unified, CUDA)
DGX H200 — 8× H200 server (1.13 TB HBM3e)
No plug-and-play build fits at Q8_0
Only used / DIY / homelab-cluster rigs fit GLM-5.2 753B (MoE) at this quant. Turn off "Only plug & play" to see them.
Sources
- https://huggingface.co/zai-org/GLM-5.2
- https://huggingface.co/zai-org/GLM-5.2-FP8
- https://recipes.vllm.ai/zai-org/GLM-5.2
- https://huggingface.co/Abiray/GLM-5.2-Q4_K_M-GGUF/tree/main
- https://www.marktechpost.com/2026/06/14/z-ai-launches-glm-5-2-with-a-usable-1m-token-context-two-thinking-effort-levels-and-no-benchmarks-at-launch/
Last updated 2026-06-27