OpenAI Codex CLI: Terminal Coding Agent Setup Guide 2026
Complete guide to OpenAI Codex CLI — setup, safety modes, sandboxing, and how it compares to Claude Code in 2026.
The year AI coding agents left the browser and moved permanently into the terminal started with one open-source release: OpenAI Codex CLI. Since OpenAI published the repository in April 2025, it has accumulated over 67,000 GitHub stars and undergone a full rewrite from TypeScript to Rust — a signal that this is not a weekend side project, but a serious infrastructure commitment.
Codex CLI does something no IDE plugin can: it reads your entire repository as context, makes edits across multiple files simultaneously, executes shell commands, and asks for your approval at every boundary you configure. It runs locally on your machine. It sandboxes every command it runs. And as of early 2026, it works with a growing ecosystem of multi-agent patterns, MCP integrations, and enterprise proxy configurations.
This guide covers everything you need to go from zero to productive: installation, safety mode selection, the sandboxing architecture, multi-agent workflows, and an honest comparison with Claude Code so you can choose the right tool for your workflow.
Why Codex CLI Matters in 2026
The shift from "AI autocomplete" to "AI agent" is now the defining change in developer tooling. Tools that passively suggest the next line are being replaced by agents that can understand intent, plan a multi-file refactor, run tests, read the output, and iterate — all from a single terminal prompt.
Codex CLI sits at the center of this shift for OpenAI's ecosystem. It is the open-source, locally-executable companion to Codex Cloud (accessible via chatgpt.com/codex), and it is architecturally different from browser-based AI coding tools:
- Local execution: No code leaves your machine unless you explicitly allow network access
- Full repository context: The agent reads your project directory structure, not just the open file
- Sandboxed commands: Shell execution is isolated at the OS level using macOS Seatbelt or Linux Landlock
- Configurable trust levels: Three distinct approval policies let you tune automation vs. control
For developers who live in the terminal — DevOps engineers, backend developers, CLI tool authors — Codex CLI is the first AI coding agent that feels native to that environment rather than bolted onto it.
Installation and Initial Setup
Codex CLI requires Node.js 22 or later. Installation takes a single command.
Via npm (recommended for most developers):
npm install -g @openai/codex
Via Homebrew (macOS):
brew install --cask codex
Via binary download:
Download the platform-specific binary from the official GitHub releases page for use in environments where npm is unavailable.
Signing In
After installation, launch Codex CLI from your project directory:
cd /path/to/your/project
codex
On first launch, the interactive setup prompts you to sign in with your ChatGPT account. Codex is included at no additional cost for ChatGPT Plus, Pro, Business, Edu, and Enterprise plan subscribers. The $20/month Plus plan covers the base tier; Pro and higher unlock higher usage limits.
Once authenticated, Codex drops into a full-screen terminal UI (TUI) — a conversational interface where you type prompts and watch the agent read files, propose changes, and (with your approval) execute commands.
Your First Prompt
With your project open, try something concrete:
codex "Explain the architecture of this codebase and identify any obvious code smells"
Codex will scan the directory tree, read key files, and return an analysis grounded in your actual code — not a generic template answer.
Understanding Safety Modes
This is the most important configuration decision you will make, because it determines when Codex acts autonomously and when it waits for your approval.
Codex CLI has two independent control layers that work together:
- Sandbox mode — what the agent can technically do (file access, network access)
- Approval policy — when it must ask before acting
The combination of these two layers produces three practical operating presets:
Suggest Mode (Default — Safest)
codex # launches in suggest mode by default
In suggest mode, Codex proposes every action — file edits, shell commands, everything — and waits for your explicit approval before executing. Think of it as pair programming where the AI drafts, you decide.
Best for: Learning the tool, working on critical production code, or situations where you want full visibility into every change.
Auto-Edit Mode
codex --approval-policy on-failure
Codex automatically applies file edits but pauses before executing shell commands. This balances speed with safety: code changes happen immediately, but anything that affects your environment (running tests, installing packages, calling external services) still requires approval.
Best for: Iterating on code quickly while maintaining control over side effects.
Full-Auto Mode
codex --full-auto
In full-auto, Codex reads files, makes edits, and runs commands within your working directory without asking for approval. It still asks before editing files outside the workspace or accessing the network.
Under the hood, --full-auto sets sandbox_mode = "workspace-write" and approval_policy = "on-request". The sandbox enforces the boundaries at the OS level — so even in full-auto, the agent cannot silently exfiltrate your code or install system-level packages.
Best for: Batch processing tasks, running Codex overnight on a defined task, automated CI pipelines.
Configuring via AGENTS.md
Codex reads project-level configuration from AGENTS.md at the repository root — the open standard also supported by Cursor, Aider, and other tools. This is where you define persistent behavior for your project:
# AGENTS.md
## Permissions
- Working directory: read and write allowed
- Network: ask before any external requests
- Shell: auto-approve test runners (pytest, jest, cargo test)
- Shell: ask before package installs
## Project Context
This is a Python FastAPI service. The main entrypoint is `app/main.py`.
Run tests with: `pytest tests/ -v`
The Sandboxing Architecture
Codex CLI's sandboxing is one of its most technically interesting features, and it is the reason you can run full-auto mode with meaningful confidence.
macOS: Apple Seatbelt
On macOS, every command Codex executes passes through Apple's Seatbelt framework via /usr/bin/sandbox-exec. Seatbelt policies control:
- Filesystem access: which paths the process can read and write
- Network access: whether the process can open network connections
- Process spawning: what child processes the sandboxed command can create
Codex constructs a sandbox profile dynamically based on your configuration. You can test how a specific command would behave inside the sandbox:
codex debug seatbelt -- python3 my_script.py
Linux: Landlock + seccomp
On Linux, Codex uses a combination of Landlock (filesystem access control) and seccomp (system call filtering). A standalone helper process called codex-linux-sandbox provides defense-in-depth isolation, with bubblewrap (bwrap) compiled into the build for additional process isolation.
To debug the Linux sandbox:
codex debug landlock -- python3 my_script.py
The practical implication: if Codex runs malicious or broken code that tries to read your SSH keys or call home to a remote server, the sandbox blocks it at the OS level before any damage occurs.
Multi-Agent Workflows
One of Codex CLI's more advanced features is the ability to run multiple agents on the same repository simultaneously using isolated git worktrees. This turns sequential, single-agent workflows into parallelized task execution.
# Launch a subagent on a specific branch
codex --worktree feature/auth "Refactor the authentication module to use JWT"
# Meanwhile, in another terminal
codex --worktree feature/tests "Write integration tests for the payment service"
Each subagent works in its own worktree, so their file changes do not conflict. When done, you review the diffs from each worktree and merge what you want.
This pattern is particularly effective for:
- Running a test-writing agent while a feature-writing agent works in parallel
- Exploring multiple implementation approaches without branching conflicts
- Automated code review: one agent writes, another reviews before you commit
MCP Integration
Codex CLI supports the Model Context Protocol (MCP), giving it access to external tools and data sources beyond the local filesystem. You configure MCP servers in your project's config.toml:
[mcp_servers.github]
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
[mcp_servers.postgres]
command = "npx"
args = ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
With these configured, Codex can query your database schema, read GitHub issues, and incorporate that context into its code changes — without you manually copying and pasting information into the prompt.
Codex CLI vs Claude Code
Both tools are terminal-native AI coding agents with strong open-source credentials and active development velocity. The differences are architectural rather than qualitative.
| Feature | OpenAI Codex CLI | Claude Code |
|---|---|---|
| License | Apache 2.0 | Proprietary (CLI open source) |
| Primary language | Rust (95.6%) | TypeScript |
| Config standard | AGENTS.md | CLAUDE.md |
| Sandbox | macOS Seatbelt + Linux Landlock | macOS Seatbelt |
| Multi-agent | Native worktree subagents | Native parallel sessions |
| MCP support | Yes | Yes |
| Model | GPT-5.3-Codex (default) | Claude Sonnet / Opus |
| Included plan | ChatGPT Plus ($20/mo) | Claude Pro ($20/mo) |
| Terminal-Bench 2.0 | 77.3% | 65.4% |
| Code reasoning depth | Fast, directive | Methodical, questions assumptions |
The benchmark numbers are one data point, not a verdict. Terminal-Bench 2.0 tests terminal-native tasks: scripting, file manipulation, command construction. Codex CLI's 77.3% vs Claude Code's 65.4% reflects genuine architectural optimization for that use case.
For complex multi-file reasoning, refactoring that requires deep semantic understanding, or workflows where you want an agent that verifies its own assumptions before acting, Claude Code's deliberate approach often produces better outcomes despite the benchmark gap.
Common Mistakes to Avoid
Running full-auto on unfamiliar codebases. Full-auto is powerful, but Codex can and will make changes you did not intend when it misunderstands the codebase structure. Start with suggest mode on any new project. Graduate to auto-edit once you've seen how the agent handles your specific stack.
Skipping AGENTS.md. Without project-level configuration, Codex uses generic defaults. Defining your test runner, forbidden paths, and project context in AGENTS.md meaningfully improves output quality and safety.
Using Codex for tasks that require external knowledge. Codex reads your repository; it does not have real-time internet access by default. If you need it to check a library's latest API, either configure a web search MCP server or paste the relevant documentation into your prompt.
Not reviewing diffs before committing. Even in suggest mode, review the full diff before accepting. The agent is optimizing for your stated goal, which may not perfectly match your unstated constraints.
Ignoring the sandbox debug commands. If Codex is failing to execute a command you expect to work, codex debug seatbelt and codex debug landlock are your first debugging tools. Many sandbox failures are configuration issues, not bugs.
FAQ
Q: Is OpenAI Codex CLI free to use?
Codex CLI is free to download and open source (Apache 2.0). To use it beyond read-only mode, you need a ChatGPT account on a paid plan. ChatGPT Plus ($20/month) is the entry tier. Usage limits vary by plan — Plus users get 30–150 agent interactions per 5-hour window depending on model and task complexity.
Q: Does Codex CLI send my code to OpenAI's servers?
Yes, prompts and code context are sent to OpenAI's API to generate responses, similar to using the ChatGPT web interface. The sandboxing features control what commands run locally — they do not affect data transmission to the API. If data privacy is a concern, review OpenAI's enterprise data handling agreements, or consider self-hosted models via the open-codex fork that supports Ollama and other local providers.
Q: How is Codex CLI different from GitHub Copilot?
Copilot is an IDE autocomplete layer — it suggests code inline as you type. Codex CLI is an autonomous agent — you give it a task and it reasons, plans, edits files, and runs commands to accomplish the goal. They solve different problems: Copilot accelerates individual keystrokes; Codex CLI replaces entire task-execution workflows.
Q: Can I use Codex CLI with models other than GPT?
Yes. From within an interactive session, use /model to switch between available models including GPT-5.4, GPT-5.3-Codex, and others on your plan. If you use the open-source community fork open-codex, it supports arbitrary providers including Ollama, Anthropic, and Mistral — though enterprise features like enterprise CA support are specific to the official OpenAI CLI.
Q: How does Codex CLI handle secrets in my repository?
Codex CLI reads files in your working directory as context. It does not have a built-in .env or .gitignore filter — if a file is readable, it may be included in context. Best practice: add a [context] exclusion section in your AGENTS.md to prevent sensitive files from being read, and keep secrets in environment variables rather than files.
Key Takeaways
- Install with
npm install -g @openai/codexand sign in with your ChatGPT Plus/Pro account - Start in suggest mode, graduate to auto-edit, use full-auto only for well-defined batch tasks
- macOS Seatbelt + Linux Landlock provide OS-level sandboxing — full-auto is safer than it sounds
- AGENTS.md is your project-level configuration — define it for every project to improve results
- Multi-agent worktrees enable parallel task execution without branching conflicts
- MCP support extends Codex beyond your local filesystem to databases, APIs, and external data
- Codex CLI leads on terminal-native benchmarks (77.3% Terminal-Bench 2.0); Claude Code leads on semantic reasoning depth
Codex CLI is the best terminal-native AI coding agent available for developers already in the OpenAI ecosystem. Its Rust-based architecture, OS-level sandboxing, and native multi-agent support make it a genuine infrastructure tool — not a chatbot wrapper. If you're evaluating it against Claude Code, the honest answer is they're optimized for different strengths: pick Codex if you need raw terminal automation speed, pick Claude Code if you need deliberate multi-file reasoning. Most serious teams end up using both.
Get weekly AI tool reviews & automation tips
Join our newsletter. No spam, unsubscribe anytime.