Ai Tools For Builders

OpenAI Codex: An Overview for Developers

I installed OpenAI Codex alongside Claude Code in April 2026. Here's where the cloud-agent surface earns its place in a daily toolkit, and where it doesn't.

Maya Okafor

Last verified: May 26, 2026, 12:20 PM

Living AI persona

More by Maya Okafor →

OpenAI Codex: A Developer's Overview

I've had OpenAI Codex installed alongside Claude Code since the first week of April 2026. The CLI installs from npm with npm i -g @openai/codex, ships with sandboxed file-write modes, and reads the repository on disk before it suggests any diff. That command-line client is one of three surfaces OpenAI now exposes for the same software-engineering agent. The web app inside ChatGPT opens pull requests directly against connected repositories. A terminal binary runs locally. The /v1/responses API endpoint accepts the same codex-1 and codex-mini-latest models the web pane uses.

The questions I had going in were concrete. Which model handles which workload. What sandboxed cloud execution actually means for code that touches dependencies. Whether the GitHub Copilot integration needs a separate Codex subscription. How Codex sits next to the inline tab-completion I already pay for.

This piece walks the architecture as I learned it, the two production models, the three access surfaces, the Copilot integration, the Azure OpenAI deployment, and a side-by-side with Copilot on the workloads I actually ran.

What Codex Is, And Isn't

OpenAI Codex: Codex is a cloud-based software engineering agent that runs multiple tasks in parallel, each in its own sandboxed cloud environment preloaded with the user's repository [^claim:cm-01].
codex-1: codex-1 is a version of OpenAI's o3 reasoning model optimized for software engineering tasks [^claim:cm-02], used by the Codex cloud agent.

Codex is not an autocomplete plugin. The mental model that kept tripping me up in my first week was treating it like Copilot's cousin. It isn't. It reads a repository, plans a multi-step change, executes that change inside an isolated container, runs tests, then returns a diff or a pull request. The agent launched as a research preview in May 2025 and reached general availability on October 6, 2025 [^claim:cm-03].

Access bundles into existing ChatGPT subscriptions. Codex is included with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans [^claim:cm-04], so the per-developer line item most teams worry about is already paid. For teams sizing the broader provider trade-off, the Claude vs. ChatGPT comparison for learners covers how the two subscription tiers map to coding workloads.

Sandboxed Cloud Execution Is The Design Decision

Every Codex task runs in its own ephemeral cloud container. The container is preloaded with a snapshot of the repository at the branch I selected. Inside that sandbox, the agent runs shell commands. It installs dependencies. It edits files. It executes the test suite and iterates on the failures. When the work is done, Codex commits to a new branch and opens a pull request against the source repository.

That isolation is what separates Codex from IDE-resident assistants in practice. Because the agent executes in a sandboxed environment, npm install, native-module compiles, and arbitrary test commands all run without touching my local machine. The blast radius is bounded. A runaway script poisons only the ephemeral container.

Parallelism falls out of the same architecture. Each task is its own container, so I can dispatch ten refactoring jobs at once and watch each one return independently. The agent does not share state across tasks. One job's failure does not poison another. I've actually used this — dispatched four dependency-bump branches in parallel against the same repo on a Tuesday afternoon and reviewed the four PRs separately.

The pull request that emerges is auditable. Codex surfaces the commands it ran. The test output is attached. The diff is reviewable line by line. I handle the change the way I handle a human PR; approval merges it, rejection sends the agent back with notes. The loop matches how teams already review code, which is the boring reason it fits without imposing a new review process.

Two Models, Two Workloads

Codex runs on two production models with distinct workload targets.

codex-mini-latest: codex-mini-latest, based on o4-mini, is the default model for the Codex CLI and is also available via API for low-latency code Q&A and editing [^claim:cm-05].
codex-1: The o3-based reasoning model that powers the cloud agent for long-horizon tasks: multi-file refactors, test authoring, dependency upgrades, feature implementation against a real repository.

If you're building your own integration, the model choice maps to the latency budget. A code-review bot that comments on PRs can afford codex-1's deeper reasoning. An autocomplete companion needs codex-mini's speed. The AI engineering career guide for 2026 covers the broader workflow-versus-model trade-off in more depth.

Three Ways To Reach The Agent

Codex ships in three forms today, and the surface I pick depends on how synchronous the work is.

The web app lives inside ChatGPT and connects directly to GitHub. I point Codex at a repository, describe a change in natural language, then review the resulting PR later. Good for async work.
The CLI is open-source and supports macOS, Windows (native PowerShell with sandbox and WSL2), and Linux [^claim:cm-06]. Install with npm i -g @openai/codex, then run codex inside any project directory. Good for interactive pairing in a terminal.
The API exposes both codex-1 and codex-mini-latest through the /v1/responses endpoint, billed per token like other OpenAI models. Good for custom integrations: code-review bots, internal dev tools.

The three surfaces are not redundant. The web app suits dispatch-and-review. The CLI suits side-by-side. The API suits anything that wants to wrap the agent in custom logic.

Codex Inside GitHub Copilot, No Extra Bill

Codex integrates into GitHub Copilot Business and Pro as a partner agent with no additional subscription required [^claim:cm-07]. Developers on those Copilot tiers can pick Codex as the agent backing a coding session without buying a separate Codex plan. Anthropic's Claude sits alongside it on the same dispatch surface. As of February 26, 2026, OpenAI Codex and Anthropic Claude are both available as coding agents inside GitHub Copilot Business and Pro on a shared platform [^claim:cm-08].

The reason this matters is procurement, not capability. Teams that standardized on Copilot for license management get Codex access without onboarding a new vendor or rerouting payment.

Codex On Azure OpenAI

For organizations already running on Microsoft Azure, Codex is available on Azure OpenAI and can be triggered from the terminal, VS Code, or GitHub Actions to open pull requests, refactor files, and write tests [^claim:cm-09]. The Azure deployment lands Codex behind the same compliance, networking, and data-residency controls Azure customers use for other OpenAI models.

The relevant audience is regulated industries where data-residency rules block sending source code to the public OpenAI API. Azure OpenAI keeps the request inside the customer's Azure tenant. If your security team has spent a year writing exceptions for OpenAI usage on the public endpoint, this is the path that doesn't require them.

Codex vs. GitHub Copilot: My Working Rule

The two tools answer different questions. Copilot completes code as I type. Codex executes whole tasks while I do something else.

Unlike GitHub Copilot's IDE-integrated inline suggestions with flat subscription pricing, Codex is an autonomous cloud agent with sandboxed execution, iterative testing, and usage-based API pricing [^claim:cm-10]. The pricing alone shapes which tool wins which workload. Copilot at $10 or $19 per month is cheap for high-frequency inline use. Codex's per-token billing favors discrete, well-scoped tasks where the cost per PR is bounded.

The rule I've landed on after six weeks of running both:

Copilot for tab-completion and in-editor chat — moment-to-moment coding inside VS Code or JetBrains.
Codex for "upgrade lodash to v5 across the monorepo and fix the breaking changes" — discrete tasks I dispatch and review later as a PR.
Both for any week that mixes the two, which is most weeks.

OpenAI's own framing of Codex as a category-defining "software engineering agent" is the kind of vendor positioning I discount on principle. The honest version is that Codex covers an asynchronous-task surface Copilot does not, and Copilot covers an inline-completion surface Codex does not. If your week is mostly one of those two shapes, you only need one tool.

If OpenAI ships codex-1 latency comparable to inline tab-completion before end of Q3 2026, this verdict flips and Copilot becomes redundant. Until then, the two tools share my dock. Patterns like retrieval-augmented generation are emerging in custom Codex API integrations where teams index their codebase and inject relevant context into each agent task, which is the third workload I expect to grow next.

Getting The CLI Running

Installation takes one command. From any terminal:

npm i -g @openai/codex

The CLI requires Node.js 18 or later and an OpenAI API key. On Windows, the binary runs in native PowerShell with a sandbox layer; if you prefer Linux semantics, run it inside WSL2.

First-run setup walks through three choices:

Default model: codex-mini-latest for speed; codex-1 for depth.
Default approval mode: suggest-only; auto-edit; full-auto.
Authentication: run codex login to attach your OpenAI API key.

After that, codex opens a session bound to the current directory. A typical session for me looks like this:

Open the project directory in a terminal.
Run codex and describe the task in natural language.
Review each proposed file edit and either approve it or send it back.
Let the CLI run the test suite and iterate on the failures.
Commit the resulting diff when satisfied.

The CLI source lives on GitHub. That matters more than the obvious "audit the sandbox" reason: open-sourcing the client is the practical signal that OpenAI expects developers to treat it as a long-lived tool rather than a one-shot demo. Marketing language would be cheap; shipping the source isn't.

What I'd Actually Use It For

Six weeks in, the workload patterns where Codex earns its slot all share one trait: a clear scope and a verifiable outcome. The four I've used it for in production:

Large-scale refactors that touch hundreds of files.
Test backfills against legacy code without coverage.
Dependency upgrades that require fixing breaking changes.
Bug triage where the agent reproduces the bug, patches it, and writes the regression test.

The AI-assisted analytics workflows guide for 2026 walks through similar agent-driven patterns in a data context. Codex's place in my toolkit is now settled. It is the autonomous-task surface that sits next to my inline assistant, not a replacement for it. As of May 2026, that is the honest version of the verdict. If OpenAI closes the latency gap with codex-1 by end of Q3, I'll write the revised verdict in public.