OpenAI Says Agents Are Transforming Work as Codex Moves Beyond Coding

Quick answer

OpenAI's new economic research paper argues that agentic AI is shifting work from short chatbot exchanges to delegated, long-running tasks. Here is what the Codex data shows, what the charts mean, and what businesses should do before reorganizing workflows around agents.

AI Watch Test the workflow before relying on the output.

Last checked: June 25, 2026. This article is based on OpenAI's June 25 post, "How agents are transforming work," and the linked OpenAI Economic Research paper, "The Shift to Agentic AI: Evidence from Codex." The findings are OpenAI-reported and come from aggregated Codex and ChatGPT usage analysis, including model-estimated task horizons. Treat the results as important early evidence, not as an independent causal productivity study.

Quick answer

OpenAI published new economic research on June 25, 2026 arguing that AI agents are changing the unit of knowledge work. Instead of asking a chatbot for one answer, users increasingly delegate longer tasks to agents that can use tools, inspect environments, modify artifacts, run work in parallel, and keep going for minutes or hours.

The evidence comes from Codex, OpenAI's agentic coding and work platform. OpenAI says the shift is most advanced inside OpenAI itself, where Codex has largely displaced ChatGPT for work-related AI output. External organizations are moving in the same direction, but more slowly. Individual users remain much earlier in the adoption curve.

The headline numbers:

Metric	OpenAI-reported result
OpenAI workers active on Codex in the prior 28 days	97.9%
Organizational users active on Codex in the prior 28 days	17.3%
Individual users active on Codex in the prior 28 days	0.7%
Share of OpenAI output tokens from Codex	99.8%
Share of organizational-user output tokens from Codex	63.3%
Share of individual-user output tokens from Codex	16.5%
Individual users making at least one request estimated above 30 minutes of human work by May 2026	80.6%
Individual users making at least one request estimated above 1 hour of human work by May 2026	70.2%
Individual users making at least one request estimated above 8 hours of human work by May 2026	25.6%
Non-developer Codex user growth since August 2025	137x among individuals, 189x among organizations, 12x inside OpenAI

The practical takeaway: agents are not just better chatbots. They are becoming a workflow layer. But the evidence also shows a major gap between frontier usage inside OpenAI and ordinary external adoption. Businesses should not copy OpenAI's internal pattern blindly. They need access controls, review gates, training, workflow redesign, and clear accountability before letting agents run high-impact work.

What OpenAI published

OpenAI's post summarizes a new paper by Drew Johnston, David Holtz, Alex Martin Richmond, Christopher Ong, Prasanna Tambe, and Aaron Chatterji. The paper analyzes Codex usage across three groups:

Individual users on personal plans.
Organizational users on Business and Enterprise plans.
OpenAI workers.

The researchers compare Codex with ChatGPT to study a broader shift from conversational AI toward agentic AI. In the paper's framing, ChatGPT is treated mainly as a conversational interface, while Codex is treated as an agentic interface. The authors note that this distinction is imperfect because ChatGPT can also use tools and some Codex turns are conversational, but it is still useful for studying how delegated work differs from chat.

OpenAI says the data was analyzed through an automated, privacy-protecting pipeline using aggregated and anonymized insights. The paper says researchers did not read underlying user messages directly.

The company is making a bigger claim than "Codex is growing." It is arguing that agentic tools change how work is organized:

Users delegate tasks instead of only asking questions.
The relevant work unit becomes a long-running task, not a single prompt.
Heavy users coordinate multiple agents in parallel.
Non-developers start doing adjacent technical work.
Verification, review, and workflow design become more important than prompt writing alone.

Why this matters

Most business AI adoption still looks like chat: summarize this, draft that, answer this question, help me write an email, explain this document. OpenAI's research points toward a different operating model.

In an agentic workflow, a user can hand over a task such as:

Refactor this internal tool and open a pull request.
Convert this dataset and generate a report.
Investigate why a build failed.
Draft a policy update based on these files.
Create a dashboard from existing spreadsheets.
Turn a repeated manual process into a script.
Compare product feedback and produce a launch brief.

That is not just "AI assistance." It is delegated production. If the agent can use tools, run commands, create artifacts, and continue in the background, the organization has to manage the agent almost like a junior worker with software access.

That changes the management problem. The hard question is no longer only "does the model answer correctly?" It becomes:

Which tasks should be delegated?
Who reviews the output?
What systems can the agent access?
What is logged?
What can the agent change without approval?
How does the team prevent bad automation from scaling quickly?
How does the business measure value after human review time is included?

Codex adoption: OpenAI is far ahead of external users

The first chart shows the share of users active on either ChatGPT or Codex during the prior 28 days who used Codex at least once. The gap is stark.

OpenAI workers are near saturation: 97.9% of active OpenAI workers used Codex. Among organizational users, the figure was 17.3%. Among individual users, it was only 0.7%.

That means OpenAI's internal adoption should be read as a frontier case, not the average market state. OpenAI workers have unusually high familiarity with models, internal incentives, training, product access, and workflows that are close to the systems being built.

For external organizations, 17.3% still matters. It suggests Codex is becoming meaningful inside businesses, even if the adoption base is far smaller than ChatGPT's.

Output tokens show a deeper shift

The second chart is more important than the first because it measures intensity. It shows Codex's share of output tokens across Codex and ChatGPT.

OpenAI chart showing Codex share of output tokens across OpenAI workers, organizational users, and individual users

The contrast is clear:

Inside OpenAI, Codex accounts for 99.8% of output tokens.
Among organizational users, Codex accounts for 63.3% of output tokens.
Among individual users, Codex accounts for 16.5% of output tokens.

The difference between active-user share and output-token share suggests that people who adopt Codex use it heavily. Individual adoption is tiny by active-user share, but the individual users who do use Codex generate a much larger share of output through it.

For businesses, this is the signal to watch. If a small group of employees starts using agents heavily, the organization may see workflow change before broad employee adoption appears in simple user-count dashboards.

Agents are taking on longer tasks

OpenAI says nearly a quarter of Codex requests are for tasks estimated to take a person more than one hour. For sampled individual users, the shift toward longer tasks accelerated sharply in 2026.

OpenAI chart showing the share of individual Codex users making requests above 30 minute, 1 hour, 4 hour, and 8 hour human-time thresholds

By May 2026, OpenAI says:

Human-time threshold	Share of sampled individual Codex users making at least one request above threshold
More than 30 minutes	80.6%
More than 1 hour	70.2%
More than 4 hours	Not highlighted in the post, but shown in the chart
More than 8 hours	25.6%

The caveat is important. OpenAI says task horizon is estimated with an LLM-as-judge using Codex transcripts, and the post says the threshold analysis is directional rather than exact. It is also based on a random sample of 0.1% of individual-user queries.

Even with those caveats, the direction is meaningful. Users are no longer treating agents only as answer engines. They are giving them larger work packages.

Heavy users run many hours of parallel agent work

The agent-turn-hours chart shows the internal OpenAI pattern. By June 2026, OpenAI says users at the 99th percentile were regularly generating more than 60 hours of Codex agent turns per day, spread across multiple parallel agents.

OpenAI chart showing Codex agent-turn hours by daily active user percentile inside OpenAI

This does not mean one employee is literally working 60 human hours in a day. It means agents are running on that user's behalf, often in parallel. The economic implication is that work volume can decouple from the user's direct attention.

That creates both upside and risk:

Upside	Risk
More experiments can run in parallel.	Review queues can become the bottleneck.
Repetitive technical tasks can be delegated.	Bad instructions can scale across many outputs.
Employees can test multiple approaches quickly.	Teams may lose track of what agents changed.
Long-running work can continue in the background.	Costs and compute usage can grow quietly.

The lesson is not "run as many agents as possible." The lesson is that orchestration becomes a management skill. The person using agents has to define work, monitor progress, verify outputs, and decide what gets merged, shipped, or discarded.

Codex became the primary AI tool across OpenAI departments

OpenAI says engineering adopted Codex first. By December 2025, the average engineer had shifted a majority of OpenAI-product output to Codex. By June 2026, the average engineer generated 99% of output tokens through Codex rather than ChatGPT.

But the more surprising finding is the speed of adoption outside engineering. OpenAI says Legal, Finance, and Recruiting crossed into majority Codex use around April 2026, and that the average lawyer or recruiter now generates more than 85% of output tokens with Codex.

OpenAI chart showing share of work at OpenAI on Codex by department since August 2025

This is a big claim because Codex began as a coding tool. OpenAI's evidence suggests that, in a high-access environment, Codex became a general work execution surface for non-engineering teams.

Examples of non-engineering work include:

Automation.
Data transformation.
Debugging.
Structured analysis.
Internal tooling.
Document and workflow generation.
Adjacent technical execution that previously required engineering help.

The business implication is not that every lawyer or recruiter becomes a software engineer. It is that agents can lower the cost of crossing task boundaries.

Output volume rose sharply across departments

OpenAI says combined output tokens from Codex and ChatGPT increased sharply inside the company from November 2025 to June 2026.

OpenAI chart showing change in combined output tokens by department since November 2025

The post highlights several department-level changes:

Department	Median combined output-token change by June 2026 vs. November 2025
Research	56x
Customer Support	32x
Engineering	27x
Legal	13x

This is not the same as a measured productivity gain. More output tokens can mean more work, more drafts, more experiments, more automation, or more noise. A company still has to ask whether the extra output becomes accepted work product.

Useful follow-up metrics include:

Accepted pull requests or artifacts per week.
Review time per accepted output.
Defect rate after agent-assisted work.
Rework rate.
Time from request to approved artifact.
Cost per completed workflow.
Employee time saved after verification.
Downstream impact on customers, revenue, risk, or compliance.

Non-developer adoption is growing faster than developer adoption

OpenAI separates Codex users into developer and non-developer personas. Non-developer users include personal and general knowledge-worker use cases, while developers include software development activities such as writing, reviewing, or refactoring code.

The non-developer growth chart is one of the most important charts in the report.

OpenAI chart showing non-developer Codex user growth across individuals, organizations, and OpenAI workers

Since August 2025, OpenAI says non-developer active users rose:

137x among individual users.
189x among organizational users.
12x inside OpenAI.

Developer adoption also grew, but less dramatically:

OpenAI chart showing developer Codex user growth across individuals, organizations, and OpenAI workers

This matters because many companies still treat coding agents as tools only for engineering departments. OpenAI's evidence suggests the broader opportunity may be knowledge workers who can now delegate technical or semi-technical tasks.

For example:

Department	Agentic work that may become more common
Legal	Contract comparison, policy drafting, clause extraction, matter tracking, structured analysis.
Finance	Spreadsheet transformation, reporting automation, variance analysis, reconciliation support.
Recruiting	Candidate pipeline cleanup, interview packet generation, workflow automation.
Operations	Process documentation, internal tools, reporting, data cleanup.
Marketing	Campaign analysis, landing-page tests, asset workflows, content operations.
Support	Case clustering, runbook drafting, escalation analysis, internal tooling.

The worker does not have to become a full developer. But they do need enough domain judgment to verify the agent's work.

Codex is expanding what non-technical workers can do

OpenAI's occupation-versus-work heat map shows that Codex usage does not stay neatly inside job descriptions. Engineering and coding remain dominant for engineering and research, but business functions also use Codex for technical work.

OpenAI says more than one-fourth of work done with Codex by workers in business functions was engineering or coding.

This is the real organizational shift. Agents can reduce the cost of moving from "I need an engineer to do this" to "I can delegate a first version, test it, and ask an engineer to review only the risky parts."

That can improve speed, but it can also create governance problems. If non-technical teams generate scripts, automations, dashboards, and workflow tools, organizations need rules for:

Where those artifacts live.
Who owns them.
Whether they can touch production data.
Who reviews them.
How they are tested.
How failures are reported.
How security and privacy controls apply.

The frontier worker may become less defined by job title and more defined by the ability to decompose work, delegate to agents, and verify outcomes.

What the paper adds beyond the blog post

The linked paper adds several details that matter for readers evaluating the results.

First, it says active Codex users grew more than fivefold in the first half of 2026.

Second, it says more than 10% of users manage three or more concurrent Codex agents at some point each week.

Third, it says 26.6% of users use skills, meaning reusable instructions or capabilities for complex workflows.

Fourth, it emphasizes that OpenAI is not representative of the typical organization. OpenAI has lower internal adoption friction, high model familiarity, broad access, strong internal knowledge sharing, and work that is close to the AI systems being developed.

That last point is essential. If a normal company expects OpenAI-like adoption without changing access, training, process, data quality, review, and incentives, it will likely be disappointed.

What is confirmed

Here is the cleanest reading of what OpenAI confirmed on June 25, 2026:

Question	Current answer
Did OpenAI publish new research on agents and work?	Yes. The post and paper were published June 25, 2026.
What product is the evidence based on?	Codex, compared with ChatGPT usage.
What populations were analyzed?	Individual users, organizational users, and OpenAI workers.
Did OpenAI say Codex is now primary inside OpenAI?	Yes, by output-token share across departments.
Did non-developer usage grow?	Yes, OpenAI reports rapid non-developer growth across individuals, organizations, and OpenAI.
Are the results independently verified?	Not in the materials reviewed. They are OpenAI-reported research results.
Is the task-duration analysis exact?	No. OpenAI says thresholds are model-estimated and should be treated as directional.
Does this prove company-wide productivity gains?	No. It shows adoption and work-pattern changes, not a causal productivity estimate.

What businesses should do next

Companies do not need to wait for perfect data to prepare for agentic work. But they should avoid unmanaged rollout.

1. Pick workflows, not departments

Do not start with "give every employee an agent." Start with workflows where the output is easy to review:

Internal tools.
Non-production data transformations.
Documentation updates.
Test generation.
Research briefs.
Support-case clustering.
Backlog cleanup.
Reporting automation.

2. Define approval gates

Agents should not merge, deploy, email customers, delete data, change permissions, or modify financial records without clear human approval.

3. Separate low-risk and high-risk work

Low-risk work can move faster. High-risk work needs stronger controls.

Lower-risk agent work	Higher-risk agent work
Drafting a memo	Sending a legal notice
Creating test data	Editing production data
Suggesting a code patch	Deploying the patch
Summarizing support cases	Issuing refunds
Creating a dashboard draft	Changing financial reporting logic

4. Measure accepted output, not raw output

OpenAI's output-token charts are useful for seeing adoption, but companies need value metrics:

Accepted artifacts.
Defects found after review.
Human review minutes.
Cycle time.
Cost per completed task.
User satisfaction.
Operational risk.

5. Train employees to supervise agents

The valuable skill is not only prompting. It is delegation:

Breaking work into testable pieces.
Giving agents the right context.
Setting constraints.
Running parallel experiments.
Inspecting diffs and logs.
Rejecting weak outputs.
Knowing when to escalate to a specialist.

Risks leaders should not ignore

Agentic work creates new failure modes.

Verification bottlenecks

If agents produce work faster than humans can review it, the bottleneck moves from creation to verification. That can make teams feel busier without producing better outcomes.

Shadow automation

Non-technical teams may create scripts, dashboards, and workflows that solve local problems but lack ownership, testing, or security review.

Access creep

Agents often need access to files, repos, databases, ticketing systems, and communication tools. Without least-privilege controls, agent access can become broader than intended.

Misleading productivity signals

More tokens, more files, and more drafts do not automatically mean more business value. Leaders should track accepted work and downstream outcomes.

Skill polarization

Workers who know how to delegate and verify agent work may become much more productive. Workers who only use chat for simple answers may see smaller gains.

Bottom line

OpenAI's new Codex research is one of the clearest public windows into how agentic AI can change work when adoption barriers are low. The data shows a move from short chatbot exchanges toward delegated, long-running, parallel agent workflows.

The most important finding is not that Codex is popular inside OpenAI. It is that agent use spreads beyond engineers once the tool becomes capable enough and accessible enough. Legal, finance, recruiting, support, operations, and other teams can start delegating work that previously required technical execution.

But this is not a simple productivity victory lap. The findings are OpenAI-reported, task horizons are model-estimated, and OpenAI is an unusually favorable adoption environment. For most organizations, the hard work is still ahead: redesigning workflows, training employees, controlling access, measuring accepted outputs, and building review systems around delegated AI labor.

FAQ

What did OpenAI announce?

OpenAI published a post and economic research paper arguing that Codex usage shows a shift from conversational AI toward agentic AI, where users delegate longer tasks to agents that can use tools and operate over time.

Is Codex only for developers?

No. Codex began as a coding tool, but OpenAI says non-developer usage grew rapidly and that non-technical departments inside OpenAI now use Codex heavily for work.

What is the biggest adoption gap?

OpenAI workers are far ahead of external users. OpenAI says 97.9% of active OpenAI workers used Codex in the prior 28 days, compared with 17.3% of organizational users and 0.7% of individual users.

Does this prove agents make workers more productive?

Not by itself. The research shows adoption patterns, task delegation, output growth, and workflow changes. It does not independently prove causal productivity gains across companies.

What should businesses do first?

Start with reviewable workflows, define access controls, require approval for high-impact actions, and measure accepted work rather than raw AI output.

Sources

Media credits

All charts in this article were provided with the request and are based on OpenAI's Codex research visuals. PNG charts were optimized to WebP for site performance; SVG charts were preserved as local assets.

Reader protocol

Before you move on

Global AI workflow guidance. Use this short checklist to turn the article into action.

Check whether the tool can access private files or account data.
Verify factual claims against primary sources before publishing.
Keep a human review step for work that affects money, school, or customers.

HacksByte editorial standard

This guide is written for practical user safety. For account, platform, or legal decisions, confirm critical steps with the official help center or your service provider.