
Alibaba shipped one of the best open-source AI models ever built — then its creator posted "bye my beloved qwen" and walked out the door, Alibaba's stock dropped 5.3%, and Google DeepMind started publicly recruiting his team.
In this review
Qwen3.5: Alibaba's Best AI Ever — Then Its Creator Said Goodbye
Qwen3.5 — Fast Facts (February 16, 2026):
- Released: February 16, 2026 — hours before the Chinese Lunar New Year holiday, in a race against Tencent, ByteDance, and Zhipu who all dropped upgrades the same week
- Architecture: 397 billion total parameters, only 17 billion active per token — Mixture of Experts with 512 experts, delivering frontier performance at 60% lower cost than its predecessor
- Speed: 8.6x faster decoding at 32K context, 19x faster at 256K context vs Qwen3-Max — on identical hardware
- Languages: 201 languages and dialects — widest coverage of any frontier model. Previous generation supported 82
- Price: $0.10/M input (Flash) · $0.40/M input (standard 397B) · $0.18/M for 1M-context session via Plus — 10–17x cheaper than Claude or GPT at equivalent quality
- Open source: Apache 2.0 license — full commercial use, fine-tuning, redistribution, zero royalties. Weights on Hugging Face under Qwen/Qwen3.5-397B-A17B
- The crisis: On March 3, 2026 — 15 days after launch — lead engineer Lin Junyang posted: "me stepping down. bye my beloved qwen." Alibaba stock fell 5.3%. Three senior executives gone within 10 weeks. Google DeepMind publicly invited the remaining team to defect
The model launched on February 16. By March 3, the man who built it was gone.
Lin Junyang — known globally as Justin Lin, the technical lead who turned Qwen from a Tsinghua spin-off into a project with 700 million Hugging Face downloads and 180,000 community derivatives — posted four words on X in the early hours of March 4, Beijing time: "me stepping down. bye my beloved qwen." His post received 5,000 likes and 700 comments within hours. Colleagues described it as "the end of an era." A Qwen contributor wrote publicly: "I know leaving wasn't your choice." Alibaba's Hong Kong-listed shares dropped 5.3% — their biggest intraday fall since October.
The day before Lin's departure was announced, Jack Ma — who has largely avoided public life since 2020 — made a rare appearance alongside Alibaba's current and former leadership in Hangzhou. The agenda of that meeting was not disclosed. The timing was not a coincidence.
This is the full story of Qwen3.5: what the model actually does, why the benchmarks matter, why it's priced at a fraction of Claude or GPT, and what the leadership collapse means for the 90,000 enterprises and 700 million downloads that now depend on it.
What Is Qwen3.5? The Technical Story
Qwen3.5 is Alibaba's fifth-generation flagship AI model family, released under Apache 2.0 open-source license across a full size range from 0.8B to 397B parameters. The headline model — Qwen3.5-397B-A17B — is a sparse Mixture of Experts model: 397 billion total parameters, but only 17 billion activate per forward pass. This achieves a 95% reduction in activation memory compared to a dense model of equivalent benchmark capability.
The architecture combines two systems that are usually kept separate: Gated Delta Networks (linear attention) and sparse MoE routing. Standard transformer attention scales quadratically with context length — double the context, roughly quadruple the compute. Gated Delta Networks scale linearly. The result is that Qwen3.5 processes a 256K context window 19x faster than its predecessor on identical hardware. That is not a marginal efficiency gain. It is a category difference for enterprise workloads processing large documents, codebases, or video transcripts.
Unlike most vision-language models — which bolt a vision encoder onto a text-only backbone as an afterthought — Qwen3.5 was trained with native multimodal fusion from the first pretraining stage. Text, images, and video tokens share the same transformer layers from the beginning. The practical result: on MathVision (visual mathematical reasoning), Qwen3.5 scores 88.6 — ahead of GPT-5.2's 83.0 and Gemini 3 Pro's 86.6. On MMMU-Pro visual reasoning, it scores 85.0. For tasks that require reading a chart, interpreting a screenshot, or analyzing a document's layout, Qwen3.5 is arguably the strongest open-source model currently available.
The Full Model Family: From a Laptop to a Datacenter
Qwen3.5 isn't a single model — it's a family released in three waves across the two weeks following February 16. Every model in the family shares the same core innovations: native multimodal training, 201-language support, Thinking (deep reasoning) and Fast (standard) inference modes, and Apache 2.0 licensing.
| Model | Size | Active Params | Context | Min VRAM | Best For |
|---|---|---|---|---|---|
| Qwen3.5-0.8B | 0.8B | 0.8B | 32K | 2GB | On-device, edge AI, phones |
| Qwen3.5-9B | 9B | 9B | 128K | 8GB | Consumer GPU, matches GPT-OSS-120B on key benchmarks |
| Qwen3.5-27B | 27B dense | 27B | 262K | 48GB | Ties GPT-5 mini on SWE-bench Verified (72.4%) |
| Qwen3.5-35B-A3B | 35B MoE | 3B | 262K | 8GB | Outperforms previous-gen 235B flagship — 3B active params, runs on a gaming GPU |
| Qwen3.5-122B-A10B | 122B MoE | 10B | 262K | ~80GB multi-GPU | 72.2 BFCL-V4 tool use — 30% above GPT-5 mini for function calling |
| Qwen3.5-397B-A17B | 397B MoE | 17B | 256K | ~220GB (4-bit quantized) | Flagship — frontier performance, open weights, Apache 2.0 |
| Qwen3.5-Plus | Same as 397B | 17B | 1M tokens | API only | Hosted on Alibaba Cloud — adds Auto mode, 1M context, web search + code interpreter tools |
The Qwen3.5-35B-A3B is the number that deserves a second look: a 35-billion-parameter model with only 3 billion active per token, that outperforms the previous-generation 235B flagship — and runs on an 8GB VRAM gaming GPU. That ratio — frontier-adjacent performance on consumer hardware — is what 40% of all new Hugging Face derivative models being Qwen-based looks like in practice.
Qwen3.5 Benchmarks: The Full Picture
| Benchmark | Qwen3.5 | GPT-5.2 | Claude Opus 4.6 | What It Tests |
|---|---|---|---|---|
| LiveCodeBench v6 | 83.6 | ~80 | ~82 | Real coding tasks from competitive programming |
| AIME 2026 (math) | 91.3 | 96.7 | ~88 | Hardest competition math — GPT-5.2 still leads |
| MathVision (visual math) | 88.6 | 83.0 | ~84 | Math problems requiring image understanding — Qwen3.5 leads |
| BFCL-V4 (tool/function calling) | 72.2 (122B) | 55.5 (GPT-5 mini) | ~68 | Function calling for agent workflows — 30% gap over GPT-5 mini |
| MMMU-Pro (visual reasoning) | 85.0 | ~82 | 75.0 | Visual multi-discipline understanding |
| SWE-bench Verified (coding) | 72.4 (27B) | ~74 | 80.9 | Real GitHub issue resolution — Claude Opus 4.6 leads |
| ERQA (embodied reasoning) | 67.5 | — | — | +28.5% vs previous Qwen3-VL (52.5) — near Gemini 3 Pro's 70.5 |
| Terminal-Bench 2.0 | 52.5 | — | 65.4 | CLI + agentic terminal work — massive jump from Qwen3-Max's 22.5 |
The honest benchmark read: Qwen3.5 leads or ties on visual reasoning, function calling, and live coding tasks. Claude Opus 4.6 and Gemini 3.1 Pro maintain clear edges on real-world GitHub engineering (SWE-bench) and hardest competition math. For agent-heavy workloads — tools, function calling, multimodal document processing — Qwen3.5 is genuinely competitive with Western frontier models at 10–17x lower cost per token.
Qwen3.5 Pricing: Every Option
| Option | Input Price | Output Price | Context | Notes |
|---|---|---|---|---|
| Free (chat.qwen.ai) | $0 | $0 | 256K | Rate-limited; both 397B-A17B and Plus available in UI |
| Qwen3.5-Flash (API) | $0.10/M | $0.30/M | 1M | 1/13th the cost of Claude Sonnet 4.6; 1/6th the response time |
| Qwen3.5-397B-A17B (API) | $0.40/M | $1.20/M | 256K | Open-weight flagship via DashScope; model ID: qwen3.5-397b-a17b |
| Qwen3.5-Plus (API) | ~$0.18/M (1M context session) | ~$0.54/M | 1M tokens | Hosted only — same 397B architecture + Auto mode + web search + code interpreter; model ID: qwen3.5-plus |
| Self-hosted (open weights) | $0 per token | $0 per token | 256K | Apache 2.0 — infrastructure costs only. Q4 quantized: ~220GB VRAM |
Qwen3.5 vs. DeepSeek V4 vs. GLM-5: The Chinese AI Comparison
| Factor | Qwen3.5 | DeepSeek V4 | GLM-5 (Zhipu AI) |
|---|---|---|---|
| Status | ✅ Released Feb 16 | Not yet released (Mar 7) | ✅ Released Feb 11 |
| Parameters (active) | 397B total / 17B active | ~1T total / ~60B active (est.) | 744B total / 44B active |
| API input price | $0.10–$0.40/M | ~$0.25/M (est.) | $0.80/M |
| Context window | 256K (1M via Plus) | 1M (confirmed) | 200K |
| Native multimodal | ✅ Text + image + video from pretraining | ✅ Multimodal (expected) | Text-primary |
| Open source license | Apache 2.0 | Apache 2.0 (expected) | MIT |
| Local hardware (consumer) | ✅ 35B-A3B runs on 8GB GPU | Dual RTX 4090 (expected) | ~1,490GB — datacenter only |
| Leadership stability | 3 senior exits in 10 weeks | ✅ Stable | ✅ Stable |
The Leadership Collapse: What Actually Happened
The story begins in January 2026, not March. Hui Binyuan — lead of Qwen Code, the model's coding arm — quietly departed for Meta. Nobody announced it publicly. The Qwen team continued releasing models. From the outside, nothing had changed.
Then came March 3. Lin Junyang submitted his resignation letter to Alibaba. The departure was communicated within the Qwen team and described by multiple people close to the matter as sudden — a surprise even to Alibaba's senior leadership. The following morning, Lin posted publicly: "me stepping down. bye my beloved qwen." On the same day, Yu Bowen — head of Qwen's post-training — also departed. Lin Kaixin, a contributor to Qwen3.5, VL, and Coder, announced his own exit shortly after. Three pillars of the Qwen technical stack gone in a single week. Four senior departures in ten weeks total.
The circumstances point to a forced restructuring rather than a voluntary decision. A Qwen contributor wrote under Lin's post: "I know leaving wasn't your choice." Multiple sources and media reports describe Alibaba reorganizing the Qwen team's structure — dismantling the "vertically integrated" model Lin had championed, where a single autonomous unit owned everything from pre-training through infrastructure to multimodal research. The new structure splits those functions into horizontal modules managed by Alibaba Cloud CTO Zhou Jingren directly.
Lin had repeatedly argued — including at the January 2026 Tsinghua AI Summit — that pre-training, post-training, and infrastructure teams need tight integration to move fast. Alibaba's new structure is the opposite of that philosophy. The company has accepted Lin Junyang's resignation and we sincerely thank him for his contributions, Alibaba Group CEO Eddie Wu wrote to Tongyi Lab staff on March 5. Yu Bowen's replacement is Zhou Hao — a former Senior Staff Researcher at Google DeepMind, a key contributor to Gemini 3, AI Mode, and Deep Research — personally recruited by Zhou Jingren.
Google DeepMind's public recruitment post (March 6, 2026):
The day after the departures became public, Omar Sanseviero — a senior member of Google DeepMind's development team — posted on X: "Qwen friends: if any of you want a new home to build great models and contribute to the open models ecosystem, please reach out! Lots of exciting things in the roadmap and so much to build ahead of us." This is not a subtle talent raid. This is Google DeepMind openly telling the remaining Qwen team that there's a seat waiting for them. For 90,000 enterprises and the teams of 180,000 derivative models built on Qwen, the question is no longer just about Qwen3.5's capabilities — it's about whether the people who built it will still be there to build Qwen4.
Will Qwen Go Closed Source? The Open-Source Risk
This is the question the developer community is asking most urgently — and it doesn't have a clean answer yet. Under Lin, the Qwen team operated with a philosophy of aggressive open sourcing: 400 models released publicly, Apache 2.0 licensing, direct engagement with the Western developer ecosystem. Lin was, as multiple observers noted, the primary bridge between Qwen and the global open-source community. His personal advocacy for open weights wasn't just a policy — it was a competitive strategy to build the derivative model ecosystem that now makes Qwen the base model for 40% of all new Hugging Face derivatives.
The new leadership structure — reporting to Alibaba Cloud CTO Zhou Jingren, with a DeepMind veteran replacing the open-source-oriented post-training lead — points toward a more product-centric, commercially driven Qwen. Alibaba has recently launched Qwen as a consumer app, merged its AI efforts into the "Qwen C-end Business Group," and consolidated model labs with consumer hardware teams. The internal Tongyi Conference described "a fundamental disagreement over how AI should be built" as the primary catalyst for the restructuring. That disagreement appears to be: open research lab vs. product unit. Lin represented the former. Alibaba's new structure represents the latter.
The Qwen3.5-397B-A17B weights are already on Hugging Face under Apache 2.0. Those weights cannot be recalled — they are already downloaded by millions of users. Whatever happens to Qwen's leadership, Qwen3.5 itself remains available and usable. The risk is Qwen4: whether Alibaba continues the open-weight strategy that made Qwen3.5 significant, or follows the path of Meta's more commercially restricted Llama license and gradually tightens the terms.
How to Access Qwen3.5
Free (chat.qwen.ai):
- Go to chat.qwen.ai — no account required for basic usage
- The model dropdown offers Qwen3.5-397B-A17B and Qwen3.5-Plus; select Plus for 1M context and Auto mode
- Thinking mode toggle: enable for deep reasoning on complex tasks; disable for fast responses
API (OpenAI-compatible — DashScope/Model Studio):
- Register at dashscope.aliyuncs.com or ModelScope — generate an AccessKey ID
- The API is OpenAI-compatible — change only the base URL and model ID:
from openai import OpenAI
client = OpenAI(
api_key="your-dashscope-api-key",
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
model="qwen3.5-plus", # or "qwen3.5-397b-a17b" / "qwen3.5-flash"
messages=[{"role": "user", "content": "Your prompt here"}],
extra_body={"enable_thinking": True} # Toggle Thinking mode
)
Local (Ollama — fastest setup):
# Small models (runs on almost anything)
ollama run qwen3.5:0.8b # 2GB VRAM
ollama run qwen3.5:9b # 8GB VRAM
# Medium models
ollama run qwen3.5:27b # ~48GB VRAM
ollama run qwen3.5:35b-a3b # ~8GB VRAM — best value, 3B active params
# Flagship (needs serious hardware)
ollama run qwen3.5 # ~220GB VRAM (4-bit quantized)
Frequently Asked Questions
What Is Qwen3.5?
Qwen3.5 is Alibaba's fifth-generation open-source AI model family, released February 16, 2026. The flagship model (Qwen3.5-397B-A17B) uses a sparse Mixture of Experts architecture with 397 billion total parameters and only 17 billion active per token, delivering frontier-level reasoning, coding, and visual performance at 60% lower cost and 19x faster inference than its predecessor. Available under Apache 2.0 license with weights on Hugging Face.
Is Qwen3.5 Free?
Yes — chat.qwen.ai offers free access to both the 397B flagship and Qwen3.5-Plus (1M context) with rate limiting. The open-source weights (Apache 2.0) are free to download and self-host with zero per-token costs. API access via DashScope starts at $0.10/M input tokens for Flash — making it 10–13x cheaper than Claude Sonnet 4.6 for equivalent quality tasks.
What Happened to Qwen's Lead Engineer?
Lin Junyang ("Justin Lin"), the technical lead who built Qwen from a nascent project to 700 million Hugging Face downloads, announced his resignation on March 3, 2026 — 15 days after the Qwen3.5 launch. Post-training lead Yu Bowen departed the same day. Coding lead Hui Binyuan had already left for Meta in January 2026. Three senior departures in ten weeks. Multiple sources describe the exits as the result of a forced restructuring — Alibaba dismantling the vertically integrated R&D model Lin had championed in favor of horizontal modules managed directly by Alibaba Cloud CTO Zhou Jingren. A colleague wrote publicly: "I know leaving wasn't your choice." Alibaba's shares fell 5.3% on the day the departures were reported.
Will Qwen Go Closed Source After the Leadership Change?
The Qwen3.5-397B-A17B weights are already public under Apache 2.0 — those cannot be recalled. The risk is Qwen4 and future releases. Lin Junyang was the primary advocate for open weights as both a philosophical and competitive strategy. The new leadership structure is more product-centric and commercially oriented. No announcement about Qwen4's licensing has been made. The developer community is treating the leadership exits as a meaningful signal that the next generation may not be as openly available.
How Does Qwen3.5 Compare to DeepSeek V4?
Qwen3.5 is available now; DeepSeek V4 is not released as of March 7, 2026. Qwen3.5 leads on native multimodality, consumer-viable self-hosting (35B-A3B runs on 8GB GPU), and current pricing. DeepSeek V4 is expected to offer a larger parameter count (~1T), confirmed 1M context window, and lower API pricing (~$0.25/M vs $0.40/M for Qwen3.5 standard). If DeepSeek V4 releases this week as sources suggest, the comparison will shift significantly — particularly on context window and price.
Is Qwen3.5 Safe to Use?
The same two-track answer as other Chinese AI models. The consumer app (chat.qwen.ai) and DashScope API are hosted on Alibaba servers in China, subject to Chinese data law — avoid using them for sensitive, proprietary, or confidential content. The open-weight model self-hosted on your own infrastructure has no Chinese server involvement. Apache 2.0 licensed Qwen3.5 running on your own GPU is legally and practically equivalent to self-hosted Llama 4 in terms of data privacy.
Qwen 3.5 Alternatives
Similar tools in Language Model
Reviews
Real experiences from verified users
No reviews yet
Be the first to share your experience






















