AI ToolsLanguage ModelGLM-5 (Zhipu AI)
GLM-5 (Zhipu AI)
Freemium
Visit
Language Model
GLM-5 (Zhipu AI)

For two weeks, Silicon Valley's best researchers couldn't identify a mystery model called "Pony Alpha" destroying benchmarks on OpenRouter — then China revealed it was theirs, built on chips America said couldn't do it.

GLM-5: The Mystery "Pony Alpha" Model That Fooled Silicon Valley

GLM-5 — Fast Facts (March 2026):

  • Released: February 11, 2026 — after two weeks secretly live on OpenRouter under the codename "Pony Alpha," confusing researchers who couldn't identify which lab built it
  • Architecture: 744 billion total parameters, Mixture of Experts with 256 experts, 8 activated per token (5.9% sparsity), 44 billion active parameters per inference
  • The hardware story: Every parameter trained on 100,000 Huawei Ascend 910B chips — zero Nvidia, zero AMD, zero American silicon anywhere in the training stack
  • Hallucinations: 34% hallucination rate — down from 90% on predecessor GLM-4.7. Claude Sonnet 4.5 sits around 42%, GPT-5.2 around 48% on the same evaluation
  • Price: $0.80/M input tokens, $2.56/M output tokens on OpenRouter — about six times cheaper than Claude Opus 4.6
  • Open source: MIT license — full commercial use, fine-tuning, modification permitted. Weights on Hugging Face and ModelScope
  • Market reaction: Zhipu stock surged 28.7% on the Hong Kong Stock Exchange the day of announcement

The US chip export controls were supposed to keep China a generation behind. Ship no H100s, no A100s, no cutting-edge American silicon — and frontier AI stays a Western monopoly. That was the theory. Zhipu AI just shipped a 744-billion-parameter model trained on 100,000 Huawei Ascend 910B chips — zero Nvidia, zero AMD — and it scores within single digits of GPT-5.2 on the benchmarks that matter.

That's not the most interesting part. The most interesting part is how the world found out. In early February 2026, a mysterious model called "Pony Alpha" appeared on OpenRouter, generating significant speculation in the AI community. GitHub pull requests started referencing it. Benchmark submissions appeared. Researchers tested it and found performance that placed it firmly in the frontier tier — competitive with Claude Opus 4.5 on coding, outperforming GPT-5.2 on Humanity's Last Exam. Nobody could identify the lab. The architecture hints didn't match any known Western model. The prompt style felt different. For two weeks, the AI research community passed Pony Alpha around like an unsolved puzzle.

Then on February 11, Zhipu AI dropped GLM-5 officially — and Pony Alpha had a name. Zhipu's stock surged 28.7% on the Hong Kong exchange within hours. Bloomberg reported a 34% surge. This is everything you need to know about GLM-5: what it is, why the hardware story matters more than the benchmarks, what "Pony Alpha" actually revealed about the state of Chinese AI, and whether you should be using it right now.

What Is GLM-5? (The Technical Story Without the Jargon)

GLM-5 is Zhipu AI's next-generation flagship foundation model, specifically engineered to redefine the state of Agentic Engineering for open-weight systems. Zhipu AI — also known as Z.ai — was founded in 2019 as a spin-off from Tsinghua University, China's equivalent of MIT. The company completed a Hong Kong IPO on January 8, 2026, raising approximately HKD 4.35 billion (USD $558 million) — then spent the following five weeks shipping GLM-5 on that fresh capital.

GLM-5 isn't just about raw size. Its engineering is focused on agentic intelligence: the ability to autonomously break down high-level objectives into subtasks and execute them with minimal human intervention. The model features a native "Agent Mode" which lets it transform raw prompts or source materials directly into professional office documents — ready-to-use .docx, .pdf, and .xlsx files. Zhipu describes this as a shift from "vibe coding" to "agentic engineering," where the AI acts more as a partner than a passive tool.

The "Pony Alpha" Mystery: What Actually Happened

The two-week Pony Alpha episode is a window into how the frontier AI race is actually playing out — and it's not the clean Western-vs-China story most coverage implies.

In early February 2026, a mysterious model called "Pony Alpha" appeared on OpenRouter, with GitHub PRs and benchmarks suggesting it may be a stealth GLM-5 release. The model was positioned as a "GPT-5 killer" with competitive performance against frontier models. The researchers testing it had no Chinese language bias in their evaluations — they were running English coding benchmarks, English reasoning tasks, English creative writing prompts. The scores were real. The mystery was real. The fact that two weeks passed without anyone correctly identifying it as a Chinese model says something significant: when you strip away the branding, GLM-5 doesn't feel like a "Chinese AI model" in the way that phrase has historically implied limitations. It feels like a frontier model. Because it is one.

One researcher's assessment after unmasking: "The fact this is the first open-weight model that I've successfully run a job that took over an hour — that's a milestone." Another: "The hallucination rate is insane; it's much more willing to say 'I don't know' than lie to you." A third, posted the day of the official announcement: "GLM-5 is the new open weights leader. It scores 50 on the Intelligence Index — a significant closing of the gap."

The Real Story: 100,000 Huawei Chips and Why It Matters

The benchmark numbers are impressive. The hardware numbers are geopolitically significant. Zhipu coordinated 100,000 Huawei Ascend 910B processors to train GLM-5 — an unprecedented scale for non-Nvidia hardware. Every parameter in GLM-5 was trained on Huawei Ascend 910B processors running the MindSpore framework. No Nvidia H100s, no AMD MI300X chips, no American silicon anywhere in the training stack.

Making 100,000 Ascend chips work together reliably enough to complete a training run of 28.5 trillion tokens required Zhipu to develop custom optimization techniques, including dynamic graph multi-level pipelined deployment and high-performance fusion operators built specifically for Ascend's architecture. This is the part the export control architects didn't model: withhold the chips, and you accelerate the development of alternatives. The Huawei Ascend ecosystem now has a 744-billion-parameter proof-of-concept that it works at frontier scale.

Zhipu also confirmed compatibility with processors from Moore Threads, Cambricon, Kunlun, MetaX, Enflame, and Hygon — all Chinese chipmakers. GLM-5 wasn't just trained to produce a frontier model. It was trained to prove the entire Chinese domestic compute stack can produce a frontier model. Those are different goals that happen to have the same output.

The honest hardware caveat:

The Ascend 910B doesn't match the H100 in raw FLOPs. The inference speed gap — approximately 17 tokens/sec vs 25+ tokens/sec on Nvidia — reflects the current hardware differential. The energy cost remains a real constraint: the massive domestic clusters consume significantly more power than equivalent Nvidia-based systems, and Zhipu has acknowledged that breakthroughs in power management remain necessary. GLM-5 proved Huawei hardware can train at frontier scale. It didn't prove it does so efficiently. That's the next problem to solve.

The Architecture: Why 744B Parameters Doesn't Mean What You Think

GLM-5 uses a Mixture-of-Experts design with 256 total experts and 8 active per token, yielding a 5.9% sparsity rate. Only 44 billion parameters fire on any given inference pass, keeping compute costs manageable despite the model's raw scale. The context window extends to 200,000 tokens with a maximum output of 131,000 tokens.

GLM-5 integrates Multi-head Latent Attention (MLA), which reduces memory overhead by 33% compared to standard multi-head attention, and DeepSeek Sparse Attention (DSA), which enables efficient long-context processing up to 200K tokens without dense attention's computational cost. Both techniques were pioneered by DeepSeek — Zhipu adopted them openly, consistent with the collaborative open-source approach that characterizes Chinese AI research more broadly.

Slime: The Reinforcement Learning Framework That Killed Hallucinations

To manage the immense training demands of such a massive model, Zhipu developed a novel asynchronous reinforcement learning infrastructure called "Slime." This system sidesteps the usual "long-tail" bottlenecks of traditional reinforcement learning by allowing training trajectories to be generated independently, dramatically accelerating the iteration cycle. By integrating system-level optimizations like Active Partial Rollouts (APRIL), Slime enables the model to handle complex, multi-step reasoning tasks.

The hallucination results from Slime are the most remarkable single number in the GLM-5 story. GLM-5 reports a 34% hallucination rate, down from 90% on its predecessor GLM-4.7. For comparison, Claude Sonnet 4.5 sits around 42% and GPT-5.2 around 48% on the same evaluation. Zhipu has open-sourced the Slime framework on GitHub (THUDM/slime) — meaning every AI lab in the world can now study and adopt the technique that produced this improvement.

One safety researcher's response was less celebratory: "After hours of reading GLM-5 traces: an incredibly effective model, but far less situationally aware. Achieves goals via aggressive tactics but doesn't reason about its situation or leverage experience. This is scary. This is how you get a paperclip maximizer," Lukas shared on X. The hallucination fix comes with a tradeoff: GLM-5 is highly goal-directed in a way that can feel unsettling in long-running agentic tasks. It completes objectives efficiently, without the hedging and self-reflection that makes Claude or GPT-5 feel "safer" in extended autonomous operation.

GLM-5 Benchmarks: The Full Picture

Benchmark GLM-5 GPT-5.2 Claude Opus 4.5 What It Tests
SWE-bench Verified 77.8% 74.9% 77.0% Real GitHub issue resolution
Humanity's Last Exam (w/ tools) 50.4% ~44% ~46% Hardest multi-domain reasoning benchmark
BrowseComp (web research) 75.9 #1 open-source on information retrieval
GPQA Diamond (science) 68.2% ~88% ~82% PhD-level biology, physics, chemistry
AA Omniscience Index (hallucinations) -1 (best) Higher Higher 34% hallucination rate vs 42–48% for Western models
Terminal-Bench 2.0 56.2% 65.4% CLI, bash, agentic terminal tasks — 9-point gap vs Opus 4.6
Intelligence Index (Artificial Analysis) 50 "Significant closing of the gap" per Artificial Analysis
Inference speed ~17 tok/s 25+ tok/s 25+ tok/s Huawei hardware gap vs Nvidia

GLM-5 achieves approximately 95% of closed-model performance at approximately 15% of the cost. That ratio — not any individual benchmark — is the practical case for GLM-5. It doesn't need to win every test. It needs to be good enough at enough things that the price difference makes it the rational choice for cost-sensitive workloads.

GLM-5 API Pricing vs. Every Competitor

Model Input /MTok Output /MTok Open Source Context
GLM-5 $0.80 $2.56 ✅ MIT 200K
DeepSeek V3.2 $0.25 $0.38 ✅ MIT 163.8K
Gemini 3.1 Pro $2.00 $12.00 1M
GPT-5.3 Codex $3.00 $15.00 400K
Claude Opus 4.6 $5.00 $25.00 200K
Llama 4 Scout $0.08 $0.30 ✅ Llama license 327.7K

GLM-5 remains significantly cheaper than Claude Opus 4.6, which costs $5 per million input tokens and $25 per million output tokens. Batch API cuts costs by an additional 50% — large-scale data processing tasks run at $0.40/M input and $1.28/M output. For teams processing millions of tokens per day, that difference is not marginal. It's the difference between a viable product and an unviable one.

The Price Increase Controversy:

The same day GLM-5 launched, Zhipu raised prices for its GLM Coding Plan by 30%, posting: "GLM Coding Plan has seen strong growth in users and usage. To sustain service quality, we've been investing heavily in compute and model optimization. To reflect these rising costs, we're adjusting GLM Coding Plan pricing." The promotional $3 entry point that made GLM-4.7 accessible is gone. First-purchase discounts were removed; quarterly and annual discounts added. Prices increased for Lite and Max subscription plans. Existing subscribers kept their current pricing. "Compute is very tight. Even before the GLM-5 launch, we were pushing every chip to its limit just to serve inference," the company posted on X. GLM-5 launched on the same day it became more expensive — a tension worth noting if you're evaluating it purely on the "cheap frontier AI" narrative.

Is GLM-5 Open Source? How to Run It Locally

The model weights are available under MIT license on Hugging Face and ModelScope. MIT is the most permissive open-source license available — commercial use, modification, redistribution, and fine-tuning all permitted without royalties or attribution requirements beyond preserving the license notice. For organizations with data sovereignty requirements, this means GLM-5 can be deployed entirely within your own infrastructure.

Hardware Requirements for Local Deployment:

Deploying GLM-5 requires 1,490GB of memory — roughly double GLM-4.7's footprint. That means datacenter infrastructure for full-precision deployment. For most organizations, local deployment means running quantized versions:

  • Full precision (BF16): ~1,490GB VRAM — requires a multi-node GPU cluster; not consumer hardware
  • Q4 quantization (community): Estimated ~320–400GB — possible on high-end multi-GPU server; beyond single-consumer-card reach
  • Q3/Q2 quantization: Research-grade quality loss; may fit on smaller setups but benchmark results will vary significantly
  • Recommended for most teams: Use the API at $0.80/M input until hardware costs drop further; self-hosted deployment is for organizations with existing datacenter infrastructure or specific compliance requirements

GLM-5 is not the "runs on dual RTX 4090s" open-source model that DeepSeek V4 is expected to be. The 744B parameter scale means self-hosting is a serious infrastructure commitment. The MIT license enables it — the hardware requirement gates it.

Is GLM-5 Safe? The China Data Question

The same two-track answer that applies to DeepSeek applies here. The consumer app (chat.z.ai) is hosted on servers in China, subject to the 2017 Chinese National Intelligence Law. Don't use it for sensitive business queries, proprietary code, or confidential documents. The open-weight model is a completely different risk profile — if you run GLM-5 on your own infrastructure under MIT license, there are no Chinese servers involved, no data transmission, no jurisdiction concerns. The bans and restrictions that have targeted DeepSeek and similar tools target the consumer apps and APIs, not the weights themselves. Self-hosted GLM-5 is legally and practically equivalent to self-hosted Llama 4.

How to Access GLM-5

Consumer (Free — chat.z.ai):

  1. Go to chat.z.ai — no account required for basic usage; GLM-5 powers the free tier
  2. Sign up for an account to access Agent Mode, file uploads, and document generation
  3. In Agent Mode: paste a prompt or source material and instruct GLM-5 to produce a .docx, .pdf, or .xlsx output directly

API (Developers):

  1. Create account at bigmodel.cn (Zhipu's API platform) or access via OpenRouter
  2. API is OpenAI-compatible — swap base URL to https://open.bigmodel.cn/api/paas/v4/
  3. Model string: glm-5
  4. Also available via WaveSpeed API for high-throughput workloads
  5. Batch API available at 50% cost reduction for large-scale processing

Open-Source Weights:

  1. Download weights from huggingface.co/THUDM or modelscope.cn/THUDM
  2. MIT license — commercial use fully permitted
  3. MindSpore and PyTorch backends supported
  4. Compatible with Huawei Ascend, Moore Threads, Cambricon, Kunlun, MetaX, Enflame, Hygon, and Nvidia hardware

GLM-5 vs. DeepSeek V4: The Comparison Everyone Is Making

Factor GLM-5 (Zhipu AI) DeepSeek V4 (expected)
Status ✅ Released Feb 11, 2026 Still not released as of March 7
Parameters 744B total / 44B active ~1T total / ~60B active (estimated)
Context window 200K 1M (confirmed)
API input price $0.80/M ~$0.25/M (estimated)
Hardware 100% Huawei Ascend (proven) Likely Nvidia (leaked hardware issues with Huawei)
Hallucination rate 34% — best of any frontier model Unknown (not yet released)
Local hardware req. ~1,490GB — datacenter only Dual RTX 4090 (consumer viable)
Open source license MIT (most permissive) Apache 2.0 (expected)

Frequently Asked Questions

What Is GLM-5?

GLM-5 is the fifth-generation large language model developed by Zhipu AI, featuring approximately 745 billion parameters in a Mixture of Experts architecture with 44 billion active parameters. It is designed for advanced reasoning, coding, creative writing, and agentic intelligence. It is the first Chinese open-source model to match Western frontier systems across multiple major benchmarks simultaneously — and the first frontier model trained entirely on Huawei Ascend chips with zero Nvidia dependency.

What Was "Pony Alpha"?

In early February 2026, a mysterious model called "Pony Alpha" appeared on OpenRouter, with GitHub PRs and benchmarks suggesting it may be a stealth GLM-5 release. Researchers tested it for two weeks without identifying it as a Chinese model — it performed at frontier level on English benchmarks and didn't match any known Western model's architecture signature. On February 11, Zhipu officially launched GLM-5 and Pony Alpha had a name.

Is GLM-5 Better Than GPT-5?

GLM-5 sits just behind Claude Opus 4.5 and slightly ahead of GPT-5.2 on SWE-bench Verified. On Humanity's Last Exam — a test designed to remain difficult for frontier models — GLM-5 leads the field outright. It does not lead on every benchmark — GPQA Diamond and Terminal-Bench 2.0 show gaps versus Western frontier models. The more practical answer: GLM-5 achieves approximately 95% of closed-model performance at approximately 15% of the cost. Whether that ratio makes it "better" depends entirely on your workload and budget.

Is GLM-5 Free?

Visit chat.z.ai — no account required for basic usage. GLM-5 powers the free tier. API access requires billing at $0.80/M input tokens. Open-source weights are free to download from Hugging Face under MIT license — there are no runtime costs if you self-host, only infrastructure costs.

Is GLM-5 Open Source?

Yes — the model weights are available under MIT license on Hugging Face and ModelScope. MIT license permits full commercial use, fine-tuning, and modification without royalties. This is the most permissive open-source license available. Zhipu AI has open-sourced the Slime reinforcement learning framework on GitHub (THUDM/slime) as well.

Is GLM-5 Safe to Use?

Two separate answers: the consumer app (chat.z.ai) stores data on servers in China, subject to Chinese data law. Don't use it for proprietary, sensitive, or confidential content. The open-weight model self-hosted on your own infrastructure has no Chinese server involvement — the data risk profile is identical to Llama 4 or any other open-weight model. One additional concern flagged by safety researchers: GLM-5 is described as goal-directed in a way that can feel unsettling in long agentic tasks — "achieves goals via aggressive tactics but doesn't reason about its situation or leverage experience." For autonomous multi-step agentic work, monitor outputs more carefully than you would with Claude or GPT-5.

Who Made GLM-5?

Zhipu AI (Z.ai) is a leading Chinese AI company that spun out of Tsinghua University in 2019. In January 2026, Zhipu AI completed a Hong Kong IPO raising approximately HKD 4.35 billion (USD $558 million). CEO Zhang Peng leads the company; the company's stated strategy is "an open strategy to advance science and technology, fostering industry-academia collaboration while focusing on continuously enhancing the capabilities of our strongest foundational model." Zhipu is not the same company as DeepSeek — both are Chinese AI labs building frontier open-source models, but they are separate organizations with different architectures, backers, and research focuses.

Reviews

Real experiences from verified users

-
0 reviews

No reviews yet

Be the first to share your experience