All news

Claude Sonnet 5 vs GLM 5.2: the closed benchmark king vs the open model you can actually run

Anthropic's Claude Sonnet 5 launched June 30 at near-Opus quality, $2/$10 per million tokens intro. Open-weights GLM 5.2 lands within a point on coding and reasoning for roughly half the API cost, and unlike Sonnet 5 you can download and run it yourself. Here is the benchmark-by-benchmark comparison.

Anthropic shipped Claude Sonnet 5 today, and the pitch is simple: near-Opus intelligence at the old Sonnet price. The catch is the one Anthropic never changes. You cannot download it, you cannot run it on your own machine, and you pay per token for as long as you use it. Two weeks ago Z.ai shipped GLM 5.2, which lands within a point or two of Sonnet 5 on coding and reasoning, costs roughly half as much through an API, and ships as open weights under an MIT license that you can pull from Hugging Face and run yourself.

So the real question for anyone deciding where to point their work is not “which model scores higher.” It is “do I rent the closed one forever, or run the open one that nearly matches it.” Here is the comparison, with the numbers checked against both vendors’ published results.

The benchmarks, side by side

All figures are each vendor’s published numbers as of June 30, 2026. SWE-bench Pro and Terminal-Bench are agentic coding, Humanity’s Last Exam is hard reasoning, OSWorld-Verified is computer use, and GDPval-AA v2 is multi-turn knowledge work scored over roughly 31 turns per task.

BenchmarkClaude Sonnet 5GLM 5.2
SWE-bench Pro (agentic coding)63.2%62.1%
Terminal-Bench 2.180.4%81.0
Humanity’s Last Exam (no tools)43.2%40.5%
Humanity’s Last Exam (with tools)57.4%54.7%
OSWorld-Verified (computer use)81.2%not reported
GDPval-AA v2 (knowledge work)1,6181,524

The honest read: this is close. Sonnet 5 takes hard reasoning, computer use, and knowledge work. GLM 5.2 edges Terminal-Bench and sits about one point behind on SWE-bench Pro. Z.ai did not publish an OSWorld-Verified number, so we leave it blank rather than guess; GLM’s headline suite is coding, agentic, and reasoning, not desktop computer use.

For scale, GLM 5.2’s GDPval-AA v2 of 1,524 leads every open-weights model and edges GPT-5.5 (1,514). Sonnet 5’s 1,618 on the same test actually nudges past Anthropic’s own top-tier Opus 4.8 (1,615), which is the whole story of this release: Sonnet 5 closes most of the gap to Opus at a fraction of the price.

What Sonnet 5 actually costs

Sonnet 5 is claude-sonnet-5 on the API. Introductory pricing is $2 per million input tokens and $10 per million output through August 31, 2026, then it reverts to the standard $3 and $15. It has a 1M-token context window by default and 128K max output, with adaptive thinking on by default.

One detail that does not show up on a price sheet but hits your bill: Sonnet 5 ships a new tokenizer that produces about 30% more tokens for the same text than Sonnet 4.6 did. Per-token pricing is unchanged, but the same prompt and the same answer now count as more tokens, so an equivalent request costs more than the headline numbers suggest. Budget for it.

What GLM 5.2 costs, and why “cost” is the wrong frame

Through Z.ai’s API, GLM 5.2 is $1.40 per million input tokens and $4.40 per million output, with cached input at $0.26. Blended, that is roughly $5.80 per million versus Sonnet 5’s $12 at intro pricing or $18 at standard. Call it half to a third, before you even account for Sonnet 5’s heavier token counts.

But the API price is the least interesting number, because GLM 5.2 is open weights. It is a 744B-parameter mixture-of-experts model with about 40B active per token (the published “744B-A40B” config), MIT-licensed, available in BF16 and FP8 on Hugging Face. Run it on your own hardware and the per-token bill goes to zero. You pay for the machine once. That is the entire premise of self-hosting, and it is why an open model that merely ties a closed one is a different proposition than a closed model that wins by a point.

Running GLM 5.2 yourself

This is the part the benchmark tables leave out. GLM 5.2 is a capacity-class model, the same tier as DeepSeek V4-Pro, not something that drops onto a single 24GB gaming card. At 744B total parameters it needs real memory: a multi-GPU rig or a large unified-memory box to hold the weights and a useful context, and the MoE design (only ~40B active per token) is what makes that throughput practical on bandwidth-limited hardware rather than requiring datacenter cards.

If you are weighing whether self-hosting GLM 5.2 beats renting Sonnet 5 for your workload, the math comes down to your hardware and your token volume. The picker on this site matches an open model like GLM 5.2 against the exact memory and bandwidth of a given machine and shows the expected speed, so you can see which boxes run it well before you buy one, and the capacity-tier hardware pages lay out the GB10, Strix Halo, and multi-GPU options that have the memory for a model this size.

The open-weights argument, from the closed side

There is a reason this comparison exists at all, and it runs straight through Anthropic’s own founder. Dario Amodei has been the loudest voice against releasing frontier models openly. In June 2026 he stated plainly that “Claude models will not be released as open weights,” arguing that “the potential for misuse at frontier capability levels, particularly for bioweapons synthesis, autonomous cyberattacks, and large-scale influence operations, creates risks that cannot be mitigated after weights are public.” His June 2026 policy essay went further, calling for FAA-style government-mandated safety evaluations before any frontier model, open or closed, can be deployed.

That is a real position held in good faith, and it is also the exact dividing line in this comparison. Sonnet 5 stays closed on principle. GLM 5.2 ships its weights to anyone, and the result is an open model a point away from Anthropic’s newest closed one. Whether you read that as proof the gap has effectively closed or as the risk Amodei is warning about, the practical fact for a buyer is the same: the open model is the one you can actually own.

Bottom line

If you want the highest scores on hard reasoning, computer use, and knowledge work, and you are fine renting, Sonnet 5 is the pick and it is a strong one at this price. If you want a model that nearly matches it on coding and reasoning, costs less per token, and can run on hardware you control with no ongoing bill, GLM 5.2 is the open answer, and it is the one worth pricing a machine around. For most teams the deciding factor is not the one-point benchmark gap. It is whether you would rather pay forever or buy the box once.

Sources