All hardware

Best AI models that run on NVIDIA RTX Spark (128 GB)

Coming soon — DGX Spark silicon in Windows laptops and desktops, announced at Computex 2026.

NVIDIA · OEM laptops + small desktops
NVIDIA RTX Spark (128 GB)
128 GB 119 GB usable 300 GB/s
switch in the live picker →
📅
Coming soon — not shipping yet. The specs above are NVIDIA's announced figures, not measured numbers. OEM systems are expected later in 2026; we'll fill in pricing, retailers, and real tokens-per-second benchmarks as review units land. Until then, the DGX Spark — built on the same GB10-class silicon — is the closest shipping proxy for what this platform can run.

Our picks for this build

Sourced from the State of Local AI snapshot — the model + quant + backend we'd actually deploy on this hardware today, with the recipe in the setup guide below.

Best dense

Qwen 3.6 27B (dense)

27 B Apache 2.0 🤗

Apr 22 2026. Dense 27B that hits 77.2% SWE-Bench Verified — beats much larger MoEs on coding. Vision-capable, 262 K native context. Best single-24 GB-card coder right now.

≥20 GB Q4
  • HLE24.0%
  • TB259.3%
  • SWE-Pro53.5%
  • SWE-Ver77.2%

Coding: The new local-coding king under 200B on r/LocalLLaMA — matches Claude Opus 4.5 on TB2 per Qwen's launch claims, beats Qwen3.5-397B-A17B on every coding eval. Daily-driver pick for Cline at Q4_K_M on a single Pro 6000 or M3 Ultra. Confirmed running ~160 tok/s with MTP on RTX 6000 per dzombak.com vLLM recipe.

Agent: Genuinely useful in Open-Claude / Claude Code routing — community reports 30-min+ sessions completing without derail. Still trails closed frontier on the very longest loops. Caps at agents:3 per site rule (sub-200B, TB2 59.3 below 65% threshold).

Best MoE that fits

Qwen 3.6 35B-A3B (MoE)

35 B · 3B active Apache 2.0 🤗

Apr 2026 release. 35B / 3B active MoE — beats Gemma 4-31B on agentic coding, matches Sonnet on most vision tasks. Native 262 K context (extensible to 1 M), ~18 GB at Q4. The new local-coding king under 200 B.

≥22 GB Q4
  • HLE21.4%
  • TB251.5%
  • SWE-Pro49.5%
  • SWE-Ver73.4%

Coding: r/LocalLLaMA's pick for fast local coding on a 24 GB card at Q4_K_M — 3B active so it's snappy. Vibes-codes 'perfectly fine' in OpenCode/Claude Code per multiple weekly-megathreads. Simon Willison's pelican test (April 2026): 'Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7' — still resonating in the community.

Agent: Solid in 5-15 tool-hop loops in Cline. Long-horizon (60+ min) Open-Claude sessions still lose thread — 3B active is a ceiling on planning. Note: Qwen-self-reported TB2 51.5 vs community 23-24% — gap is harness-driven (Terminus-2 vs little-coder agent).

Dense runner-up

Mistral Medium 3.5 128B

128 B Modified MIT 🤗

Apr 30 2026. Western 128B dense with vision + 256 K context. 77.6% SWE-Bench Verified; first credible mid-tier open-weight from Mistral in months. Modified MIT.

≥80 GB Q4
  • SWE-Ver77.6%

Coding: Apr 30 2026 launch with built-in PR-opening coding agent. Western 128B-dense with vision + 256K — early r/LocalLLaMA reports treat it as a credible Cline driver but trailing Qwen 3.6-27B on real refactors.

Agent: Mistral's agent SDK is OK; in Open-Claude it handles ~20-min sessions reliably. Long-horizon ceiling still unclear pending community evals.

Every open-weights model that fits, ranked by composite score

Composite blends benchmark averages (60 %) with editorial 0-5 ratings (40 %). Closed-frontier references mix into the ranking and stay amber-tinted.

Modeltg/sppTTFT @ 100KHLETB2SWE-ProSWE-VerAiderLCBGPQAMMLU-ProScore
Qwen 3 Next 80B-A3B (MoE)80 B · 3 B active · moe🤗
78.4%77.2%82.7%4920
77.6%4689
GLM-4.5-Air 106B (MoE)106 B · 12 B active · moe🤗
57.6%4488
DiffusionGemma 26B-A4B26 B · 4 B active · diffusion-moe🤗
69.1%73.2%77.6%4358
Mistral Small 4 119B-A6B (MoE)119 B · 6 B active · moe🤗
71.2%4301
24.0%59.3%53.5%77.2%83.9%87.8%86.2%4280
Qwen 3 32B32 B · dense🤗
65.7%65.5%4278
57.5%65.2%84.0%4171
Phi-4 14B14 B · dense🤗
56.1%70.4%4160
Qwen 3.6 35B-A3B (MoE)35 B · 3 B active · moe🤗
21.4%51.5%49.5%73.4%80.4%86.0%85.2%4084
Qwen 3.5 122B-A10B (MoE)122 B · 10 B active · moe🤗
25.3%49.4%72.0%78.9%86.6%86.7%4021
18.3%31.0%60.5%81.2%79.2%83.7%3839
19.5%42.9%35.7%52.0%80.0%84.3%85.2%3697
GPT-OSS 120B120 B · 5 B active · moe🤗
18.5%18.7%16.2%62.4%87.8%80.9%90.0%3573
45.3%66.0%3361
Gemma 3 27B27 B · dense🤗
42.4%67.5%3321
Llama 4 Scout 109B-A17B (MoE)109 B · 17 B active · moe🤗
32.8%57.2%74.3%3315
32.6%72.2%3174
Gemma 4 26B-A4B (MoE)26 B · 4 B active · moe🤗
8.7%34.2%13.8%17.4%77.1%82.3%82.6%3060
Qwen 3 Coder 30B-A3B (MoE)30 B · 3 B active · moe🤗
50.3%3042
Llama 3.3 70B70 B · dense🤗
28.8%50.5%68.9%2990
Llama 3.1 8B8 B · dense🤗
34.6%49.0%2527
Qwen 3 8B8 B · dense🤗
2.8%47.0%65.5%2325
Gemini 3.1 ProGoogle DeepMind · closed125 t/s2.1 min44.7%80.2%54.2%80.6%91.7%94.3%91.0%
ChatGPT 5.5OpenAI · closed61 t/s1.6 min52.2%82.0%58.6%88.7%88.0%93.6%
Claude Sonnet 4.6Anthropic · closed45 t/s3.0 s49.0%53.4%79.6%89.9%
Claude Opus 4.8Anthropic · closed58 t/s2.9 min57.9%69.2%88.6%93.6%
Open in the live picker (Q2 / Q4 / Q5 / Q8 toggles) → Try other hardware → Submit a benchmark for NVIDIA RTX Spark (128 GB) ↗

Last updated 2026-06-13