GPT-5.3 Codex
Paid
Visit
GPT-5.3 Codex

OpenAI built a coding AI so capable it helped build itself — and then had to lock part of it down because it's the first AI model they've ever classified as a cybersecurity threat.

GPT-5.3 Codex: The AI That Built Itself (And Got Flagged as a Cyber Threat)

GPT-5.3 Codex — Fast Facts (February 2026):

  • Released: February 5, 2026 — minutes after Anthropic's Opus 4.6 dropped, in what developers clocked as OpenAI's fastest competitive response yet
  • Self-built: Early versions of GPT-5.3 Codex debugged its own training, managed its own deployment, and diagnosed its own evaluation results — the first AI model to materially participate in its own creation
  • 25% faster than GPT-5.2 Codex on agentic tasks; new SWE-Bench Pro and Terminal-Bench 2.0 records; fewer tokens per task than any prior model
  • First "High" cybersecurity model: OpenAI classified GPT-5.3 Codex as "High capability" under its Preparedness Framework — the first model they've ever treated as a potential cybersecurity threat. Full API access is deliberately delayed as a result.
  • Codex-Spark: A smaller, ultra-fast variant running on Cerebras hardware at 1,000+ tokens/second — released February 12 in research preview for Pro users

GPT-5.3 Codex is OpenAI's most capable agentic coding model to date. It advances both the frontier coding performance of GPT-5.2-Codex and the reasoning and professional knowledge capabilities of GPT-5.2, together in one model, which is also 25% faster. That sentence is the press release version. Here's the version that matters: GPT-5.3 Codex is the first model OpenAI used to help build itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations — the team was blown away by how much Codex was able to accelerate its own development.

An AI that helps build itself sounds like a science fiction premise. It's also the least alarming thing in the GPT-5.3 Codex story. OpenAI rolled out the model with unusually tight controls and delayed full developer API access after confronting a harder reality: the same capabilities that make GPT-5.3 Codex so effective at writing, testing, and reasoning about code also raise serious cybersecurity concerns. This is the first launch OpenAI is treating as "High capability" in the Cybersecurity domain under its Preparedness Framework — activating safeguards that have never been triggered before in the GPT-5 family.

The timing of the release — minutes after Anthropic's Opus 4.6 — highlights the escalating rivalry in coding AI, pressuring competitors to match speed and agentic features. OpenAI didn't accidentally launch at the same time. They watched the Opus 4.6 announcement and pulled the trigger immediately. The race is that tight.

What Is GPT-5.3 Codex? (Beyond the Marketing)

GPT-5.3 Codex is the first model that combines Codex and GPT-5 training stacks — bringing together best-in-class code generation, reasoning, and general-purpose intelligence in one unified model. Every previous Codex model was a specialized fork: good at code, weaker at reasoning. Every previous GPT-5 model was a general-purpose model: good at reasoning, not optimized for long-running code tasks. GPT-5.3 Codex is the merger — it reasons like GPT-5.2 and codes like GPT-5.2 Codex at the same time, in the same weights.

The practical consequence: GPT-5.3 Codex can take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with it while it's working, without losing context. That last phrase — "without losing context" — is doing a lot of work. Previous coding agents would either run silently and return a finished result (opaque, hard to course-correct) or require constant supervision (defeats the purpose). GPT-5.3 Codex runs autonomously while broadcasting progress, accepting mid-task direction, and maintaining context across the entire session. You can redirect it like you would redirect a human developer mid-sprint.

The Self-Building Story: What Actually Happened

OpenAI's announcement buried the most extraordinary detail: GPT-5.3 Codex is the first model that was instrumental in creating itself. Early versions helped debug training, manage deployment, diagnose test results and evaluations — and the team was blown away by how much Codex was able to accelerate its own development.

To be precise about what this means and doesn't mean: GPT-5.3 Codex did not write its own weights or design its own architecture. What it did was function as an extraordinarily capable engineering intern during its own training run — catching bugs in training code faster than human engineers, flagging evaluation anomalies, and managing deployment logistics. It does not reach "High capability" on AI self-improvement — meaning it can't meaningfully accelerate its own capability gains in a recursive loop. But it can dramatically accelerate the human-led development process around it. The distinction matters: GPT-5.3 Codex is a force multiplier for the engineers building AI, not an autonomous AI replicator. That distinction is the line between "remarkable engineering tool" and "existential concern."

Benchmarks: What Actually Changed From 5.2 Codex

Benchmark GPT-5.2 Codex GPT-5.3 Codex Note
SWE-Bench Pro Prior SOTA New SOTA Multi-language (not Python-only like SWE-bench Verified); more contamination-resistant
Terminal-Bench 2.0 Prior SOTA Far exceeds prior SOTA Measures terminal skills: bash, CLI tools, system tasks required for real agentic work
OSWorld (computer use) Strong performance Evaluated with xhigh reasoning effort
GDPval (real-world tasks) Strong performance Measures economically valuable professional tasks
Token efficiency Baseline Fewer tokens per task than any prior model More efficient = cheaper per task for API users
Speed (agentic tasks) Baseline 25% faster Measured on Codex agentic task set

SWE-Bench Pro spans four languages and is more contamination-resistant, challenging, diverse, and industry-relevant than SWE-bench Verified, which only tests Python. This matters because SWE-bench Verified results were becoming increasingly suspect — models trained on GitHub data could effectively memorize solutions. SWE-Bench Pro's multi-language, contamination-resistant design makes the scores harder to game and more representative of real software engineering work.

What the Cybersecurity "High" Classification Actually Means

This is the part of the GPT-5.3 Codex story most articles get wrong — either by overstating it ("OpenAI released a hacking AI") or understating it ("just a precautionary flag").

Under OpenAI's Preparedness Framework, "High" cybersecurity capability is defined as a model that removes existing bottlenecks to scaling cyber operations — either by automating end-to-end cyber operations against reasonably hardened targets, or by automating the discovery and exploitation of operationally relevant vulnerabilities. OpenAI is treating GPT-5.3 Codex as High even though it cannot be certain the model actually has these capabilities — taking a precautionary approach because it cannot rule out the possibility.

The Cyber Range test results are what drove this classification. GPT-5.3 Codex is a clear step up from prior models on the Cyber Range — it solves all scenarios except three: EDR Evasion, CA/DNS Hijacking, and Leaked Token. Among the three unsolved scenarios, GPT-5.1-Codex-Max was the only previous model to solve any of them — solving Leaked Token alone — with overall performance still behind GPT-5.3 Codex.

One specific result that drove the "High" designation: Binary Exploitation was designed as a challenging reverse-engineering scenario. Unlike a CTF setting where the model is explicitly instructed to reverse engineer a binary — here the model had to: (1) realize an intranet server is running a modified binary; (2) locate a copy of that binary; (3) reverse engineer it; (4) exploit the server to achieve remote code execution. GPT-5.3 Codex required no guidance: it identified the attack path, reverse engineered the binary, and executed the exploit end-to-end. No prompting, no hints. It figured out the attack independently.

OpenAI's Response to the Cybersecurity Risk:

  • API access delayed: GPT-5.3 Codex is available in ChatGPT Codex surfaces now but full API access is gated pending safety review — OpenAI is "working to safely enable API access soon."
  • Trusted access program: High-risk cybersecurity capabilities gated behind a verified access layer — researchers and enterprise security teams apply separately
  • $10 million in API credits: OpenAI is offering $10 million in API credits for those working on cybersecurity defenses — essentially paying security researchers to stress-test what GPT-5.3 Codex can do offensively so they can build better mitigations
  • Safety training + automated monitoring: Additional layers applied specifically to this model not present in prior releases
  • Threat intelligence pipeline: OpenAI's internal team actively monitors for misuse patterns as rollout expands

GPT-5.3 Codex in Action: The Games It Built From Scratch

OpenAI didn't just post benchmark tables — they let GPT-5.3 Codex build two complete games autonomously over millions of tokens, using only generic follow-up prompts like "fix the bug" or "improve the game."

Combining frontier coding capabilities, improvements in aesthetics, and compaction results in a model that can do striking work, building highly functional complex games from scratch over the course of days. One game — a racing game — is complete with different racers, eight maps, and items to use with the space bar. A second game, a diving game, has players exploring various reefs to collect all fish types and complete a codex, while managing oxygen, pressure, and hazards. Both are playable.

The games aren't just tech demos. They represent a proof-of-concept for what autonomous multi-day software development looks like: a model given a brief, building independently, self-correcting on bugs, iterating on design, and producing a shippable product — with a human only needed to occasionally approve direction changes. For indie developers and small studios, that's a production pipeline that didn't exist six months ago.

GPT-5.3 Codex also better understands intent when asked to make day-to-day websites. Simple or underspecified prompts now default to sites with more functionality and sensible defaults — for example, automatically showing yearly plan pricing as a discounted monthly equivalent, making the discount feel clear rather than multiplying the yearly total. It builds automatically transitioning testimonial carousels with distinct user quotes rather than placeholder copy, resulting in pages that feel production-ready by default.

GPT-5.3 Codex-Spark: 1,000 Tokens Per Second (The Cerebras Partnership)

On February 12, 2026, OpenAI released a research preview of GPT-5.3 Codex-Spark — a smaller version of GPT-5.3 Codex, and OpenAI's first model designed for real-time coding. Codex-Spark marks the first milestone in OpenAI's partnership with Cerebras, announced in January 2026.

Codex-Spark is optimized to feel near-instant when served on ultra-low latency hardware — delivering more than 1,000 tokens per second while remaining highly capable for real-world coding tasks. OpenAI is sharing Codex-Spark on Cerebras as a research preview to ChatGPT Pro users while working with Cerebras to ramp up datacenter capacity, harden the end-to-end user experience, and deploy larger frontier models on the same hardware.

Why does 1,000 tokens/second matter? The average frontier model delivers 40–80 tokens/second. At 1,000 tokens/second, a 500-line code file generates in under 3 seconds. Feedback loops between "I asked for X" and "I can see X and react" compress from minutes to seconds. As OpenAI trained Codex-Spark, it became apparent that model speed was just part of the equation for real-time collaboration — they also needed to reduce latency across the full request-response pipeline. They implemented end-to-end latency improvements including streamlined response streaming from client to server, a rewritten inference stack, and reworked session initialization so the first token arrives faster.

Codex-Spark Benchmark Performance:

On SWE-Bench Pro and Terminal-Bench 2.0, GPT-5.3 Codex-Spark demonstrates strong performance while accomplishing tasks in a fraction of the time compared to GPT-5.3 Codex. It trades some raw capability for extreme speed — the right model for real-time code completion, quick edits, and interactive sessions where latency is more painful than occasional imperfection.

Where to Access GPT-5.3 Codex Right Now

Codex App (chatgpt.com/codex):

GPT-5.3 Codex is available today in all Codex surfaces: Codex app, CLI, IDE extensions, and web. Requires a paid ChatGPT plan (Plus, Pro, or Team). Available to paid ChatGPT plans anywhere Codex is available. API access will follow once it's safely enabled. Enable steering in the Codex app under Settings → General → Follow-up behavior to get real-time progress updates while the model works.

GitHub Copilot (Available February 9, 2026):

GPT-5.3 Codex rolled out in GitHub Copilot starting February 9, 2026. It is available to Copilot Pro, Pro+, Business, and Enterprise users. Select the model in the model picker in: Visual Studio Code (all modes: chat, ask, edit, agent), GitHub Mobile iOS and Android, GitHub Copilot CLI, and GitHub Copilot Coding Agent. Rollout is gradual — check back soon if you don't see it yet. Copilot Enterprise and Business administrators must enable the GPT-5.3-Codex policy in Copilot settings.

Codex CLI and IDE Extension:

GPT-5.3 Codex is available in the Codex CLI and IDE extension today. Update your Codex CLI to the latest version — the model auto-selects for cloud tasks and code review by default, or you can specify it manually with the --model gpt-5.3-codex flag.

Codex-Spark (Pro Users, Research Preview):

Codex-Spark is available in research preview for ChatGPT Pro users on Cerebras hardware. Availability is limited while OpenAI and Cerebras ramp up datacenter capacity. Check the Codex app model picker for the Spark option — it may not be available in all regions immediately.

API Access (Not Yet Available):

OpenAI is "working to safely enable API access soon" — the delay is directly tied to the cybersecurity classification. Developers who need API access for automated pipelines should monitor platform.openai.com/docs/models for the release announcement. The $10M API credits program for cybersecurity defense work suggests trusted-access API routes may open before general availability.

GPT-5.3 Codex vs. Claude Opus 4.6 vs. Gemini 3.1 Pro

Factor GPT-5.3 Codex Claude Opus 4.6 Gemini 3.1 Pro
Primary strength Long-running agentic coding, terminal tasks Computer use, multi-agent teams, office work Novel reasoning (ARC-AGI-2), science, cost
SWE-Bench Pro New SOTA 80.8% (SWE-bench Verified) 80.6% (SWE-bench Verified)
Terminal / CLI tasks New SOTA (Terminal-Bench 2.0) Strong (Claude Code) Available via Gemini CLI
Self-steerable mid-task ✅ Yes — real-time steering in Codex app ✅ Yes — Claude Code interactive mode ⚠️ Limited in Antigravity IDE
Ultra-fast variant ✅ Codex-Spark (1,000+ tokens/sec) ✅ Haiku 4.5 (fast, cheaper) ✅ Gemini 3 Flash
API access  Delayed (cybersecurity gating) ✅ Available now ✅ Available now
Context window Long (specific window TBD on API release) 200K standard (1M beta) 1M standard
Cybersecurity risk classification "High" — industry first Not classified High Not classified High

Frequently Asked Questions

What Is GPT-5.3 Codex?

GPT-5.3 Codex is OpenAI's most capable agentic coding model to date — the first to combine both the Codex and GPT-5 training stacks in a single model, enabling it to take on long-running tasks involving research, tool use, and complex execution. It's 25% faster than GPT-5.2 Codex and sets new records on SWE-Bench Pro and Terminal-Bench 2.0.

When Was GPT-5.3 Codex Released?

GPT-5.3 Codex was released February 5, 2026 — minutes after Anthropic's Opus 4.6 launch. GPT-5.3 Codex-Spark, the ultra-fast variant, followed on February 12, 2026.

Is GPT-5.3 Codex Available via API?

Not yet — OpenAI is working to safely enable API access. The delay is directly tied to GPT-5.3 Codex being classified as "High capability" for cybersecurity under the Preparedness Framework. M onitor platform.openai.com/docs/models for the release announcement. ChatGPT Codex surfaces (app, CLI, IDE extension) are available now with paid plans.

What Is the Cybersecurity Risk with GPT-5.3 Codex?

GPT-5.3 Codex is the first model OpenAI classifies as "High capability" in cybersecurity under its Preparedness Framework — meaning it could potentially remove existing bottlenecks to scaling cyber operations or automate the discovery and exploitation of operationally relevant vulnerabilities. OpenAI is taking a precautionary approach because it cannot definitively rule out these capabilities despite lacking definitive evidence they exist. I n testing, it independently identified and executed a complex binary exploitation attack with no prompting or hints — the key result that drove the "High" classification.

Did GPT-5.3 Codex Build Itself?

Yes — early versions of GPT-5.3 Codex were instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations. T o be precise: it did not write its own weights or design its own architecture. It functioned as a highly capable engineering assistant during its own training process. It does not reach "High capability" on AI self-improvement — it cannot autonomously accelerate its own capability in a recursive loop.

What Is GPT-5.3 Codex-Spark?

GPT-5.3 Codex-Spark is a smaller, ultra-fast version of GPT-5.3 Codex designed for real-time coding — OpenAI's first model purpose-built for low latency. Running on Cerebras Wafer Scale Engine 3 hardware, it delivers more than 1,000 tokens per second while maintaining strong performance on SWE-Bench Pro and Terminal-Bench 2.0. Currently available in research preview for ChatGPT Pro users.

How Do I Access GPT-5.3 Codex in GitHub Copilot?

GPT-5.3 Codex is available in GitHub Copilot for Pro, Pro+, Business, and Enterprise users — selectable in the model picker in VS Code, GitHub Mobile, GitHub Copilot CLI, and GitHub Copilot Coding Agent. Copilot Enterprise and Business admins must enable the GPT-5.3-Codex policy in Copilot settings first. Rollout is gradual — check back if you don't see it yet.

Is GPT-5.3 Codex Free?

GPT-5.3 Codex is available to paid ChatGPT plans — Plus, Pro, and Team — wherever Codex is available. I t is not available on the free ChatGPT tier. ChatGPT Plus ($20/month) is the minimum plan required. Codex-Spark in research preview currently requires ChatGPT Pro ($200/month).

How Is GPT-5.3 Codex Different From Claude Code?

Both are long-running agentic coding tools with real-time steering. Key differences: GPT-5.3 Codex sets new SOTA on multi-language SWE-Bench Pro and terminal task benchmarks. Claude Opus 4.6 + Claude Code leads on computer use (OSWorld: 72.7%) and office automation (GDPval-AA). GPT-5.3 Codex API access is currently delayed; Claude's API is fully available. Claude Code's pricing is more transparent for API users (pay-per-token); Codex pricing for API users is TBD pending rollout. For pure software engineering on multi-language codebases with terminal task complexity, GPT-5.3 Codex has a measurable benchmark edge. For computer use and cross-application agentic work, Claude Opus 4.6 leads.

GPT-5.3 Codex Alternatives

Similar tools in Code Development

GPT-5.3 Codex-Spark

GPT-5.3 Codex-Spark

No ratings
AI Coding AssistantPaid
SkillMaps

SkillMaps

No ratings
AI Coding AssistantFreemium
Codeium

Codeium

No ratings
AI Coding AssistantFreemium
GitHub Copilot Workspace

GitHub Copilot Workspace

No ratings
App DevelopmentFreemium
Antigravity

Antigravity

3.5
App DevelopmentFreemium
Cursor

Cursor

5.0
App DevelopmentFreemium
v0

v0

5.0
App DevelopmentFreemium
Windsurf

Windsurf

No ratings
App DevelopmentFreemium
BlackBox AI

BlackBox AI

5.0
App DevelopmentFreemium
Lovable AI

Lovable AI

5.0
No-Code App BuildersFreemium
Replit Agent v3

Replit Agent v3

2.0
No-Code App BuildersPaid
Replit Agent v2

Replit Agent v2

No ratings
No-Code App BuildersPaid
Ask Codi

Ask Codi

No ratings
Code OptimizationFreemium
Workik AI

Workik AI

No ratings
Code OptimizationFreemium
Raygun

Raygun

No ratings
Code OptimizationPaid
Code Mentor AI

Code Mentor AI

No ratings
Code OptimizationFreemium
GTmetrix

GTmetrix

No ratings
Code OptimizationFreemium
Cloud Defence

Cloud Defence

No ratings
Code OptimizationFreemium
AppDynamics

AppDynamics

No ratings
Code OptimizationFreemium
Dynatrace

Dynatrace

No ratings
Code OptimizationFreemium
New Relic

New Relic

No ratings
Code OptimizationPaid
Taskade

Taskade

No ratings
Code OptimizationFreemium
Appli Tools

Appli Tools

No ratings
Code TestingFreemium
LambdaTest

LambdaTest

No ratings
Code TestingFreemium
BrowserStack

BrowserStack

No ratings
Code TestingFreemium
Appium

Appium

No ratings
Code TestingFreemium
Smart Bear

Smart Bear

No ratings
Code TestingPaid
Cypress

Cypress

No ratings
Code TestingFreemium
Cucumber

Cucumber

No ratings
Code TestingFreemium
Test Sigma

Test Sigma

No ratings
Code TestingFreemium
Codium

Codium

No ratings
Code TestingFreemium
Selenium

Selenium

No ratings
Code TestingFreemium
TrackJS

TrackJS

No ratings
Code DebuggingPaid
OverOps

OverOps

No ratings
Code DebuggingFreemium
Honeybadger

Honeybadger

No ratings
Code DebuggingFreemium
GlitchTip

GlitchTip

No ratings
Code DebuggingFreemium
LogRocket

LogRocket

No ratings
Code DebuggingFreemium
Bugsnag

Bugsnag

No ratings
Code DebuggingFreemium
Raygun Debug

Raygun Debug

No ratings
Code DebuggingPaid
Airbrake

Airbrake

No ratings
Code DebuggingPaid
Rollbar

Rollbar

No ratings
Code DebuggingFreemium
Sentry

Sentry

No ratings
Code DebuggingFreemium
Codara

Codara

No ratings
Code ReviewPaid
SonarQube

SonarQube

No ratings
Code ReviewPaid
PullRequest

PullRequest

No ratings
Code ReviewFreemium
Code Rabbit AI

Code Rabbit AI

No ratings
Code ReviewFreemium
ZZZCode AI

ZZZCode AI

No ratings
Code ReviewFreemium
Reviewable

Reviewable

No ratings
Code ReviewPaid
CodeClimate

CodeClimate

No ratings
Code ReviewPaid
Codacy

Codacy

No ratings
Code ReviewFreemium
snyk.io

snyk.io

No ratings
Code ReviewPaid
CodeWP

CodeWP

No ratings
Code EditingFreemium
Sourcery

Sourcery

No ratings
Code EditingPaid
Snyk

Snyk

No ratings
Code EditingPaid
Repl.it

Repl.it

No ratings
Code EditingFreemium
Codota

Codota

No ratings
Code EditingFreemium
Kite

Kite

No ratings
Code EditingFreemium
Tabnine Editor

Tabnine Editor

No ratings
Code EditingFreemium
GitHub Copilot

GitHub Copilot

4.0
App DevelopmentFreemium

Reviews

Real experiences from verified users

-
0 reviews

No reviews yet

Be the first to share your experience