Hardware to run Kimi K2.7 Code 1T (MoE)

Jun 2026 release — code-specialized variant built on K2.6 with ~30% fewer thinking tokens for the same task. Same 1 T total / 32 B active MoE (384 experts top-8 + 1 shared), 256 K context, Modified MIT. Internal Moonshot benchmarks only (Kimi Code Bench v2 62.0, MCP Atlas 76.0, MCP Mark Verified 81.1) — standard SWE-Bench / TB2 cells stay null until third-party leaderboards land.

Moonshot · text

Kimi K2.7 Code 1T (MoE)

1000 B params 540 GB Q4 file 600 GB min Q4 714 GB min Q5 1152 GB min Q8 256K ctx Modified MIT 🤗

switch in the live picker →

Quantization

Cheapest

4× Strix Halo cluster (512 GB unified)

AMD · rack of 4 mini-PCs, 10 GbE fabric

$11,500

tokens / secQ2

235B-MoE 29 t/s

671B-MoE 6.0 t/s

1T-MoE —

Memory512 GB · 384 usable

Bandwidth256 GB/s

Idle / Active32 W / 480 W

Sticker$11,500

Why: Lowest sticker that still fits Kimi K2.7 Code 1T (MoE) ($12k USD).

Amazon (Strix Halo Mini PC) ↗GMKtec EVO-X2 ↗Bosgame M5 ↗Framework ↗

📺 Reviews on YouTube

▶ Kimi-K2 (1T) / GLM 4.7 (355B) on a 4-node Strix Halo cluster — 512 GB unified memory

▶ AMD Strix Halo / Ryzen AI Max+ 395 — honest review (single-node baseline)

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ2

235B-MoE 198 t/s

671B-MoE 126 t/s

1T-MoE 108 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 108 t/s on Kimi K2.7 Code 1T (MoE)-class models at Q2.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ2

235B-MoE 156 t/s

671B-MoE 60 t/s

1T-MoE 36 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

Best value

8× RTX Pro 6000 Blackwell server (768 GB)

NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)

$78,000

tokens / secQ2

235B-MoE 264 t/s

671B-MoE 90 t/s

1T-MoE 48 t/s

Memory768 GB · 744 usable

Bandwidth1792 GB/s

Idle / Active220 W / 4800 W

Sticker$78,000

Why: Best $/tg-per-second — ~$1,625 per t/s.

Amazon ↗PNY Pro ↗CoreWeave ↗SuperMicro ↗

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ2

235B-MoE 186 t/s

671B-MoE 120 t/s

1T-MoE 102 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ2

235B-MoE 312 t/s

671B-MoE 114 t/s

1T-MoE 66 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ2

235B-MoE 26 t/s

671B-MoE 9.6 t/s

1T-MoE 7.2 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: 220 W active — lowest power draw of the fitting builds.

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Cheapest

Mac Studio M3 Ultra 512 GB

Apple · small desktop

$14,199

tokens / secQ2

235B-MoE 26 t/s

671B-MoE 9.6 t/s

1T-MoE 7.2 t/s

Memory512 GB · 480 usable

Bandwidth819 GB/s

Idle / Active12 W / 220 W

Sticker$14,199

Why: Lowest sticker that still fits Kimi K2.7 Code 1T (MoE) ($14k USD).

Apple ↗B&H Photo ↗

📺 Reviews on YouTube

▶ M3 Ultra Mac Studio Review

Fastest

Single AMD Instinct MI355X 288 GB workstation

AMD · 4U server (OAM, liquid-cooled)

$28,000

tokens / secQ2

235B-MoE 156 t/s

671B-MoE 60 t/s

1T-MoE 36 t/s

Memory288 GB · 282 usable

Bandwidth8000 GB/s

Idle / Active140 W / 1400 W

Sticker$28,000

Why: Highest measured tg/s — 36 t/s on Kimi K2.7 Code 1T (MoE)-class models at Q2.

Dell XE9712 ↗AMD MI350 series ↗Supermicro UBB ↗

Every other build that runs Kimi K2.7 Code 1T (MoE)

6 additional builds fit Kimi K2.7 Code 1T (MoE) at Q2_K (280 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q2)	Active W	5-yr power
4× DGX Spark cluster (512 GB unified, CUDA)NVIDIA · rack of 4 desktops	$20k	512 / 488 GB	273 GB/s	11 t/s	920 W	$3.4k
8× Strix Halo cluster (1024 GB unified)AMD · rack of 8 mini-PCs, 10/25 GbE fabric	$23k	1024 / 768 GB	256 GB/s	6.0 t/s	960 W	$3.4k
2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)Apple · two desktops, Thunderbolt 5 RDMA	$28k	1024 / 960 GB	819 GB/s	8.4 t/s	440 W	$1.5k
Quad RTX Pro 6000 Blackwell build (384 GB)NVIDIA · workstation / 4U pedestal	$38k	384 / 372 GB	1792 GB/s	—	2200 W	$8k
8× DGX Spark cluster (1024 GB unified, CUDA)NVIDIA · rack of 8 desktops, 200 GbE fabric	$44k	1024 / 976 GB	273 GB/s	16 t/s	1840 W	$7k
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	72 t/s	5600 W	$20k

Cheapest

8× Strix Halo cluster (1024 GB unified)

AMD · rack of 8 mini-PCs, 10/25 GbE fabric

$23,200

tokens / secQ4

235B-MoE 32 t/s

671B-MoE 8.0 t/s

1T-MoE 5.0 t/s

Memory1024 GB · 768 usable

Bandwidth256 GB/s

Idle / Active64 W / 960 W

Sticker$23,200

Why: Lowest sticker that still fits Kimi K2.7 Code 1T (MoE) ($23k USD).

Amazon (Strix Halo Mini PC) ↗GMKtec EVO-X2 ↗Bosgame M5 ↗Framework ↗

📺 Reviews on YouTube

▶ Kimi-K2 (1T) / GLM 4.7 (355B) on a 4-node Strix Halo cluster — closest documented baseline

▶ AMD Strix Halo / Ryzen AI Max+ 395 — honest review (single-node baseline)

Fastest

DGX B200 — 8× B200 server (1.44 TB HBM3e)

NVIDIA · 10U DGX server

$475,000

tokens / secQ4

235B-MoE 165 t/s

671B-MoE 105 t/s

1T-MoE 90 t/s

Memory1440 GB · 1404 usable

Bandwidth8000 GB/s

Idle / Active900 W / 10200 W

Sticker$475,000

Why: Highest measured tg/s — 90 t/s on Kimi K2.7 Code 1T (MoE)-class models at Q4.

NVIDIA DGX B200 ↗SuperMicro HGX B200 ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside a 1.44 TB HBM3e NVIDIA HGX B200 AI Server from ASRock Rack

All-rounder

2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)

Apple · two desktops, Thunderbolt 5 RDMA

$28,400

tokens / secQ4

235B-MoE 29 t/s

671B-MoE 9.0 t/s

1T-MoE 7.0 t/s

Memory1024 GB · 960 usable

Bandwidth819 GB/s

Idle / Active24 W / 440 W

Sticker$28,400

Why: Top quartile across speed, value, memory headroom, and efficiency — the "buy this if unsure" pick.

eBay ↗Apple Refurbished ↗B&H Photo (used) ↗

📺 Reviews on YouTube

▶ Apple didn't have to go this hard… (4× M3 Ultra TB5 cluster)

▶ Mac Studio CLUSTER vs M3 Ultra

Best value

8× RTX Pro 6000 Blackwell server (768 GB)

NVIDIA · 4U server (e.g. SuperMicro AS-4125GS-TNRT)

$78,000

tokens / secQ4

235B-MoE 220 t/s

671B-MoE 75 t/s

1T-MoE 40 t/s

Memory768 GB · 744 usable

Bandwidth1792 GB/s

Idle / Active220 W / 4800 W

Sticker$78,000

Why: Best $/tg-per-second — ~$1,950 per t/s.

Amazon ↗PNY Pro ↗CoreWeave ↗SuperMicro ↗

Best CUDA

DGX H200 — 8× H200 server (1.13 TB HBM3e)

NVIDIA · 8U DGX / HGX server rack

$380,000

tokens / secQ4

235B-MoE 155 t/s

671B-MoE 100 t/s

1T-MoE 85 t/s

Memory1128 GB · 1100 usable

Bandwidth4800 GB/s

Idle / Active700 W / 6500 W

Sticker$380,000

Why: Strongest CUDA-only software stack among fitting builds.

NVIDIA DGX H200 ↗SuperMicro HGX ↗Lambda Labs ↗

📺 Reviews on YouTube

▶ Inside the SUPER NVIDIA H200 Server From Supermicro (8U HGX H200)

Most VRAM

12× RTX Pro 6000 Blackwell rack (1152 GB)

NVIDIA · 8U server rack (multi-node, 1-2 chassis)

$118,000

tokens / secQ4

235B-MoE 260 t/s

671B-MoE 95 t/s

1T-MoE 55 t/s

Memory1152 GB · 1116 usable

Bandwidth1792 GB/s

Idle / Active340 W / 7400 W

Sticker$118,000

Why: 1116 GB usable — most headroom for batching and longer contexts.

Amazon ↗CoreWeave ↗SuperMicro ↗SHI enterprise ↗

Efficient

8× DGX Spark cluster (1024 GB unified, CUDA)

NVIDIA · rack of 8 desktops, 200 GbE fabric

$43,500

tokens / secQ4

235B-MoE 42 t/s

671B-MoE 16 t/s

1T-MoE 13 t/s

Memory1024 GB · 976 usable

Bandwidth273 GB/s

Idle / Active220 W / 1840 W

Sticker$43,500

Why: 1840 W active — lowest power draw of the fitting builds.

Amazon (8× DGX Spark) ↗Micro Center ↗NVIDIA Marketplace ↗ServeTheHome guide ↗

📺 Reviews on YouTube

▶ NVIDIA didn't want me to do this — 8× DGX Spark 1 TB VRAM cluster (AZisk)

No plug-and-play build fits at Q4_K_M

Only used / DIY / homelab-cluster rigs fit Kimi K2.7 Code 1T (MoE) at this quant. Turn off "Only plug & play" to see them.

Every other build that runs Kimi K2.7 Code 1T (MoE)

1 additional build fit Kimi K2.7 Code 1T (MoE) at Q4_K_M (600 GB usable minimum), sorted by sticker price.

Build	Price	Memory	Bandwidth	tg/s (Q4)	Active W	5-yr power
8× H100 80 GB serverNVIDIA · server rack	$280k	640 / 620 GB	3350 GB/s	60 t/s	5600 W	$20k

Open in the live picker (Q2 / Q5 / Q8 toggles) → Compare Kimi K2.7 Code 1T (MoE) against other LLMs → Pick LLMs for your hardware → Submit a benchmark for Kimi K2.7 Code 1T (MoE) ↗

Sources

https://huggingface.co/moonshotai/Kimi-K2.7-Code

Last updated 2026-06-13

Hardware to run Kimi K2.7 Code 1T (MoE)

4× Strix Halo cluster (512 GB unified)

DGX B200 — 8× B200 server (1.44 TB HBM3e)

Single AMD Instinct MI355X 288 GB workstation

8× RTX Pro 6000 Blackwell server (768 GB)

DGX H200 — 8× H200 server (1.13 TB HBM3e)

12× RTX Pro 6000 Blackwell rack (1152 GB)

Mac Studio M3 Ultra 512 GB

Mac Studio M3 Ultra 512 GB

Single AMD Instinct MI355X 288 GB workstation

Every other build that runs Kimi K2.7 Code 1T (MoE)

8× Strix Halo cluster (1024 GB unified)

DGX B200 — 8× B200 server (1.44 TB HBM3e)

2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)

8× RTX Pro 6000 Blackwell server (768 GB)

DGX H200 — 8× H200 server (1.13 TB HBM3e)

12× RTX Pro 6000 Blackwell rack (1152 GB)

8× DGX Spark cluster (1024 GB unified, CUDA)

No plug-and-play build fits at Q4_K_M

Every other build that runs Kimi K2.7 Code 1T (MoE)

8× Strix Halo cluster (1024 GB unified)

DGX B200 — 8× B200 server (1.44 TB HBM3e)

2× Mac Studio M3 Ultra 512 GB cluster (TB5 / MLX)

8× RTX Pro 6000 Blackwell server (768 GB)

DGX H200 — 8× H200 server (1.13 TB HBM3e)

12× RTX Pro 6000 Blackwell rack (1152 GB)

8× DGX Spark cluster (1024 GB unified, CUDA)

No plug-and-play build fits at Q5_K_M

DGX B200 — 8× B200 server (1.44 TB HBM3e)

No plug-and-play build fits at Q8_0

Sources