Qwen3.5: Alibaba's Best AI Ever — Then Its Creator Said Goodbye

Qwen3.5 — Fast Facts (February 16, 2026):

Released: February 16, 2026 — hours before the Chinese Lunar New Year holiday, in a race against Tencent, ByteDance, and Zhipu who all dropped upgrades the same week
Architecture: 397 billion total parameters, only 17 billion active per token — Mixture of Experts with 512 experts, delivering frontier performance at 60% lower cost than its predecessor
Speed: 8.6x faster decoding at 32K context, 19x faster at 256K context vs Qwen3-Max — on identical hardware
Languages: 201 languages and dialects — widest coverage of any frontier model. Previous generation supported 82
Price: $0.10/M input (Flash) · $0.40/M input (standard 397B) · $0.18/M for 1M-context session via Plus — 10–17x cheaper than Claude or GPT at equivalent quality
Open source: Apache 2.0 license — full commercial use, fine-tuning, redistribution, zero royalties. Weights on Hugging Face under Qwen/Qwen3.5-397B-A17B
The crisis: On March 3, 2026 — 15 days after launch — lead engineer Lin Junyang posted: "me stepping down. bye my beloved qwen." Alibaba stock fell 5.3%. Three senior executives gone within 10 weeks. Google DeepMind publicly invited the remaining team to defect

The model launched on February 16. By March 3, the man who built it was gone.

Lin Junyang — known globally as Justin Lin, the technical lead who turned Qwen from a Tsinghua spin-off into a project with 700 million Hugging Face downloads and 180,000 community derivatives — posted four words on X in the early hours of March 4, Beijing time: "me stepping down. bye my beloved qwen." His post received 5,000 likes and 700 comments within hours. Colleagues described it as "the end of an era." A Qwen contributor wrote publicly: "I know leaving wasn't your choice." Alibaba's Hong Kong-listed shares dropped 5.3% — their biggest intraday fall since October.

The day before Lin's departure was announced, Jack Ma — who has largely avoided public life since 2020 — made a rare appearance alongside Alibaba's current and former leadership in Hangzhou. The agenda of that meeting was not disclosed. The timing was not a coincidence.

This is the full story of Qwen3.5: what the model actually does, why the benchmarks matter, why it's priced at a fraction of Claude or GPT, and what the leadership collapse means for the 90,000 enterprises and 700 million downloads that now depend on it.

What Is Qwen3.5? The Technical Story

Qwen3.5 is Alibaba's fifth-generation flagship AI model family, released under Apache 2.0 open-source license across a full size range from 0.8B to 397B parameters. The headline model — Qwen3.5-397B-A17B — is a sparse Mixture of Experts model: 397 billion total parameters, but only 17 billion activate per forward pass. This achieves a 95% reduction in activation memory compared to a dense model of equivalent benchmark capability.

The architecture combines two systems that are usually kept separate: Gated Delta Networks (linear attention) and sparse MoE routing. Standard transformer attention scales quadratically with context length — double the context, roughly quadruple the compute. Gated Delta Networks scale linearly. The result is that Qwen3.5 processes a 256K context window 19x faster than its predecessor on identical hardware. That is not a marginal efficiency gain. It is a category difference for enterprise workloads processing large documents, codebases, or video transcripts.

Unlike most vision-language models — which bolt a vision encoder onto a text-only backbone as an afterthought — Qwen3.5 was trained with native multimodal fusion from the first pretraining stage. Text, images, and video tokens share the same transformer layers from the beginning. The practical result: on MathVision (visual mathematical reasoning), Qwen3.5 scores 88.6 — ahead of GPT-5.2's 83.0 and Gemini 3 Pro's 86.6. On MMMU-Pro visual reasoning, it scores 85.0. For tasks that require reading a chart, interpreting a screenshot, or analyzing a document's layout, Qwen3.5 is arguably the strongest open-source model currently available.

The Full Model Family: From a Laptop to a Datacenter

Qwen3.5 isn't a single model — it's a family released in three waves across the two weeks following February 16. Every model in the family shares the same core innovations: native multimodal training, 201-language support, Thinking (deep reasoning) and Fast (standard) inference modes, and Apache 2.0 licensing.

Model	Size	Active Params	Context	Min VRAM	Best For
Qwen3.5-0.8B	0.8B	0.8B	32K	2GB	On-device, edge AI, phones
Qwen3.5-9B	9B	9B	128K	8GB	Consumer GPU, matches GPT-OSS-120B on key benchmarks
Qwen3.5-27B	27B dense	27B	262K	48GB	Ties GPT-5 mini on SWE-bench Verified (72.4%)
Qwen3.5-35B-A3B	35B MoE	3B	262K	8GB	Outperforms previous-gen 235B flagship — 3B active params, runs on a gaming GPU
Qwen3.5-122B-A10B	122B MoE	10B	262K	~80GB multi-GPU	72.2 BFCL-V4 tool use — 30% above GPT-5 mini for function calling
Qwen3.5-397B-A17B	397B MoE	17B	256K	~220GB (4-bit quantized)	Flagship — frontier performance, open weights, Apache 2.0
Qwen3.5-Plus	Same as 397B	17B	1M tokens	API only	Hosted on Alibaba Cloud — adds Auto mode, 1M context, web search + code interpreter tools

The Qwen3.5-35B-A3B is the number that deserves a second look: a 35-billion-parameter model with only 3 billion active per token, that outperforms the previous-generation 235B flagship — and runs on an 8GB VRAM gaming GPU. That ratio — frontier-adjacent performance on consumer hardware — is what 40% of all new Hugging Face derivative models being Qwen-based looks like in practice.

Qwen3.5 Benchmarks: The Full Picture

Benchmark	Qwen3.5	GPT-5.2	Claude Opus 4.6	What It Tests
LiveCodeBench v6	83.6	~80	~82	Real coding tasks from competitive programming
AIME 2026 (math)	91.3	96.7	~88	Hardest competition math — GPT-5.2 still leads
MathVision (visual math)	88.6	83.0	~84	Math problems requiring image understanding — Qwen3.5 leads
BFCL-V4 (tool/function calling)	72.2 (122B)	55.5 (GPT-5 mini)	~68	Function calling for agent workflows — 30% gap over GPT-5 mini
MMMU-Pro (visual reasoning)	85.0	~82	75.0	Visual multi-discipline understanding
SWE-bench Verified (coding)	72.4 (27B)	~74	80.9	Real GitHub issue resolution — Claude Opus 4.6 leads
ERQA (embodied reasoning)	67.5	—	—	+28.5% vs previous Qwen3-VL (52.5) — near Gemini 3 Pro's 70.5
Terminal-Bench 2.0	52.5	—	65.4	CLI + agentic terminal work — massive jump from Qwen3-Max's 22.5

The honest benchmark read: Qwen3.5 leads or ties on visual reasoning, function calling, and live coding tasks. Claude Opus 4.6 and Gemini 3.1 Pro maintain clear edges on real-world GitHub engineering (SWE-bench) and hardest competition math. For agent-heavy workloads — tools, function calling, multimodal document processing — Qwen3.5 is genuinely competitive with Western frontier models at 10–17x lower cost per token.

Qwen3.5 Pricing: Every Option

Option	Input Price	Output Price	Context	Notes
Free (chat.qwen.ai)	$0	$0	256K	Rate-limited; both 397B-A17B and Plus available in UI
Qwen3.5-Flash (API)	$0.10/M	$0.30/M	1M	1/13th the cost of Claude Sonnet 4.6; 1/6th the response time
Qwen3.5-397B-A17B (API)	$0.40/M	$1.20/M	256K	Open-weight flagship via DashScope; model ID: `qwen3.5-397b-a17b`
Qwen3.5-Plus (API)	~$0.18/M (1M context session)	~$0.54/M	1M tokens	Hosted only — same 397B architecture + Auto mode + web search + code interpreter; model ID: `qwen3.5-plus`
Self-hosted (open weights)	$0 per token	$0 per token	256K	Apache 2.0 — infrastructure costs only. Q4 quantized: ~220GB VRAM

Qwen3.5 vs. DeepSeek V4 vs. GLM-5: The Chinese AI Comparison

Factor	Qwen3.5	DeepSeek V4	GLM-5 (Zhipu AI)
Status	✅ Released Feb 16	Not yet released (Mar 7)	✅ Released Feb 11
Parameters (active)	397B total / 17B active	~1T total / ~60B active (est.)	744B total / 44B active
API input price	$0.10–$0.40/M	~$0.25/M (est.)	$0.80/M
Context window	256K (1M via Plus)	1M (confirmed)	200K
Native multimodal	✅ Text + image + video from pretraining	✅ Multimodal (expected)	Text-primary
Open source license	Apache 2.0	Apache 2.0 (expected)	MIT
Local hardware (consumer)	✅ 35B-A3B runs on 8GB GPU	Dual RTX 4090 (expected)	~1,490GB — datacenter only
Leadership stability	3 senior exits in 10 weeks	✅ Stable	✅ Stable

The Leadership Collapse: What Actually Happened

The story begins in January 2026, not March. Hui Binyuan — lead of Qwen Code, the model's coding arm — quietly departed for Meta. Nobody announced it publicly. The Qwen team continued releasing models. From the outside, nothing had changed.

Then came March 3. Lin Junyang submitted his resignation letter to Alibaba. The departure was communicated within the Qwen team and described by multiple people close to the matter as sudden — a surprise even to Alibaba's senior leadership. The following morning, Lin posted publicly: "me stepping down. bye my beloved qwen." On the same day, Yu Bowen — head of Qwen's post-training — also departed. Lin Kaixin, a contributor to Qwen3.5, VL, and Coder, announced his own exit shortly after. Three pillars of the Qwen technical stack gone in a single week. Four senior departures in ten weeks total.

The circumstances point to a forced restructuring rather than a voluntary decision. A Qwen contributor wrote under Lin's post: "I know leaving wasn't your choice." Multiple sources and media reports describe Alibaba reorganizing the Qwen team's structure — dismantling the "vertically integrated" model Lin had championed, where a single autonomous unit owned everything from pre-training through infrastructure to multimodal research. The new structure splits those functions into horizontal modules managed by Alibaba Cloud CTO Zhou Jingren directly.

Lin had repeatedly argued — including at the January 2026 Tsinghua AI Summit — that pre-training, post-training, and infrastructure teams need tight integration to move fast. Alibaba's new structure is the opposite of that philosophy. The company has accepted Lin Junyang's resignation and we sincerely thank him for his contributions, Alibaba Group CEO Eddie Wu wrote to Tongyi Lab staff on March 5. Yu Bowen's replacement is Zhou Hao — a former Senior Staff Researcher at Google DeepMind, a key contributor to Gemini 3, AI Mode, and Deep Research — personally recruited by Zhou Jingren.

Google DeepMind's public recruitment post (March 6, 2026):

The day after the departures became public, Omar Sanseviero — a senior member of Google DeepMind's development team — posted on X: "Qwen friends: if any of you want a new home to build great models and contribute to the open models ecosystem, please reach out! Lots of exciting things in the roadmap and so much to build ahead of us." This is not a subtle talent raid. This is Google DeepMind openly telling the remaining Qwen team that there's a seat waiting for them. For 90,000 enterprises and the teams of 180,000 derivative models built on Qwen, the question is no longer just about Qwen3.5's capabilities — it's about whether the people who built it will still be there to build Qwen4.

Will Qwen Go Closed Source? The Open-Source Risk

This is the question the developer community is asking most urgently — and it doesn't have a clean answer yet. Under Lin, the Qwen team operated with a philosophy of aggressive open sourcing: 400 models released publicly, Apache 2.0 licensing, direct engagement with the Western developer ecosystem. Lin was, as multiple observers noted, the primary bridge between Qwen and the global open-source community. His personal advocacy for open weights wasn't just a policy — it was a competitive strategy to build the derivative model ecosystem that now makes Qwen the base model for 40% of all new Hugging Face derivatives.

The new leadership structure — reporting to Alibaba Cloud CTO Zhou Jingren, with a DeepMind veteran replacing the open-source-oriented post-training lead — points toward a more product-centric, commercially driven Qwen. Alibaba has recently launched Qwen as a consumer app, merged its AI efforts into the "Qwen C-end Business Group," and consolidated model labs with consumer hardware teams. The internal Tongyi Conference described "a fundamental disagreement over how AI should be built" as the primary catalyst for the restructuring. That disagreement appears to be: open research lab vs. product unit. Lin represented the former. Alibaba's new structure represents the latter.

The Qwen3.5-397B-A17B weights are already on Hugging Face under Apache 2.0. Those weights cannot be recalled — they are already downloaded by millions of users. Whatever happens to Qwen's leadership, Qwen3.5 itself remains available and usable. The risk is Qwen4: whether Alibaba continues the open-weight strategy that made Qwen3.5 significant, or follows the path of Meta's more commercially restricted Llama license and gradually tightens the terms.

How to Access Qwen3.5

Free (chat.qwen.ai):

Go to chat.qwen.ai — no account required for basic usage
The model dropdown offers Qwen3.5-397B-A17B and Qwen3.5-Plus; select Plus for 1M context and Auto mode
Thinking mode toggle: enable for deep reasoning on complex tasks; disable for fast responses

API (OpenAI-compatible — DashScope/Model Studio):

Register at dashscope.aliyuncs.com or ModelScope — generate an AccessKey ID
The API is OpenAI-compatible — change only the base URL and model ID:

from openai import OpenAI
client = OpenAI(
    api_key="your-dashscope-api-key",
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
    model="qwen3.5-plus",          # or "qwen3.5-397b-a17b" / "qwen3.5-flash"
    messages=[{"role": "user", "content": "Your prompt here"}],
    extra_body={"enable_thinking": True}   # Toggle Thinking mode
)

Local (Ollama — fastest setup):

# Small models (runs on almost anything)
ollama run qwen3.5:0.8b     # 2GB VRAM
ollama run qwen3.5:9b       # 8GB VRAM

# Medium models
ollama run qwen3.5:27b      # ~48GB VRAM
ollama run qwen3.5:35b-a3b  # ~8GB VRAM — best value, 3B active params

# Flagship (needs serious hardware)
ollama run qwen3.5          # ~220GB VRAM (4-bit quantized)

Frequently Asked Questions

What Is Qwen3.5?

Qwen3.5 is Alibaba's fifth-generation open-source AI model family, released February 16, 2026. The flagship model (Qwen3.5-397B-A17B) uses a sparse Mixture of Experts architecture with 397 billion total parameters and only 17 billion active per token, delivering frontier-level reasoning, coding, and visual performance at 60% lower cost and 19x faster inference than its predecessor. Available under Apache 2.0 license with weights on Hugging Face.

Is Qwen3.5 Free?

Yes — chat.qwen.ai offers free access to both the 397B flagship and Qwen3.5-Plus (1M context) with rate limiting. The open-source weights (Apache 2.0) are free to download and self-host with zero per-token costs. API access via DashScope starts at $0.10/M input tokens for Flash — making it 10–13x cheaper than Claude Sonnet 4.6 for equivalent quality tasks.

What Happened to Qwen's Lead Engineer?

Lin Junyang ("Justin Lin"), the technical lead who built Qwen from a nascent project to 700 million Hugging Face downloads, announced his resignation on March 3, 2026 — 15 days after the Qwen3.5 launch. Post-training lead Yu Bowen departed the same day. Coding lead Hui Binyuan had already left for Meta in January 2026. Three senior departures in ten weeks. Multiple sources describe the exits as the result of a forced restructuring — Alibaba dismantling the vertically integrated R&D model Lin had championed in favor of horizontal modules managed directly by Alibaba Cloud CTO Zhou Jingren. A colleague wrote publicly: "I know leaving wasn't your choice." Alibaba's shares fell 5.3% on the day the departures were reported.

Will Qwen Go Closed Source After the Leadership Change?

The Qwen3.5-397B-A17B weights are already public under Apache 2.0 — those cannot be recalled. The risk is Qwen4 and future releases. Lin Junyang was the primary advocate for open weights as both a philosophical and competitive strategy. The new leadership structure is more product-centric and commercially oriented. No announcement about Qwen4's licensing has been made. The developer community is treating the leadership exits as a meaningful signal that the next generation may not be as openly available.

How Does Qwen3.5 Compare to DeepSeek V4?

Qwen3.5 is available now; DeepSeek V4 is not released as of March 7, 2026. Qwen3.5 leads on native multimodality, consumer-viable self-hosting (35B-A3B runs on 8GB GPU), and current pricing. DeepSeek V4 is expected to offer a larger parameter count (~1T), confirmed 1M context window, and lower API pricing (~$0.25/M vs $0.40/M for Qwen3.5 standard). If DeepSeek V4 releases this week as sources suggest, the comparison will shift significantly — particularly on context window and price.

Is Qwen3.5 Safe to Use?

The same two-track answer as other Chinese AI models. The consumer app (chat.qwen.ai) and DashScope API are hosted on Alibaba servers in China, subject to Chinese data law — avoid using them for sensitive, proprietary, or confidential content. The open-weight model self-hosted on your own infrastructure has no Chinese server involvement. Apache 2.0 licensed Qwen3.5 running on your own GPU is legally and practically equivalent to self-hosted Llama 4 in terms of data privacy.