The 5 autonomy levels of an AI agent — and why 90% of projects stop at level 2
The Sheridan framework adapted to 2026 business: five tiers from simple assistant to fully autonomous system. Diagnosis, architectural requirements and a realistic timeline to move up a level.
In 2026, just about anything that comes out of the AI stack is labeled an agent. Yet most projects sold as such don't get past autonomy level 2. This article offers a five-tier reading grid, adapted from the 1978 Sheridan-Verplank framework, to place an AI agent without being swept along by the marketing.
Why five tiers and not a binary autonomy
Autonomy is not declared, it is measured. Sheridan and Verplank understood this in the late 1970s working on undersea teleoperation. They defined a ten-level scale where the machine progressively takes on observation, reasoning, decision and execution. The higher you go, the more human responsibility shifts from operation to supervision.
For contemporary business, ten levels are too fine. You can condense without losing the essentials to five tiers, more readable for an executive committee and more useful for steering an AI-agent project in production.
Level 1 — Suggestion: the machine proposes, the human disposes
This is minimal autonomy. The machine analyzes a situation and proposes options, but it is the human who chooses and executes. Copilot in VS Code, ChatGPT in open chat, Claude generating content on demand: all operate at level 1. Decision and action remain human.
The productivity gain is real but limited. Each cycle waits for human intervention. Level 1 is useful for creative or exploratory tasks, where the value-add comes precisely from human judgement at the moment of choice.
Level 2 — Execution under validation: the machine proposes and executes after your OK
At level 2, the machine goes further: it prepares the action, sometimes pre-executes it in a test environment, and submits it for human validation. If the human validates, it executes. Otherwise, it reformulates. A sales follow-up agent that drafts the emails and waits for your OK click works at level 2. Many SaaS vendors are here, because level 2 is safe and easy to sell.
It is also the upper limit of most AI-agent products in 2026. Moving to level 3, you must accept that the machine decides alone on certain cases. The step is higher than it looks.
Level 3 — Bounded autonomy: the machine decides and executes, escalates outside the scope
At level 3, the machine takes routine decisions alone, within a formally defined perimeter. When it faces a case outside its perimeter, or when its confidence score falls below a threshold, it escalates to a human. A ticket-triage agent that answers known cases and hands off new ones operates at level 3.
Level 3 is the realistic horizon of most enterprise agent projects. It is also where economic value appears: the machine handles 80–95% of volume without intervention, the human keeps control of the remaining 5–20%. ROI flips positive.
Level 4 — Supervised autonomy: the machine does it all, the human audits
At level 4, the machine decides and executes over the entire scope, including complex cases. The human is no longer in the real-time decision loop. They audit after the fact, weekly or monthly, on samples. This is the typical level of an auditable credit-scoring agent, or a fraud-monitoring agent that decides alone but whose decisions are reviewed by a risk committee each week.
Level 4 demands considerable governance investment. You need audit dashboards, continuous calibration mechanisms, and organizational discipline to take reviews seriously. Few companies are ready for the investment. Those that do (banks, mature insurance carriers) get massive leverage from it.
Level 5 — Total autonomy: very rare, and probably undesirable
At level 5, the machine decides, executes, learns from its results and corrects itself without human intervention, except in case of serious incident. That is total autonomy in the strict sense. In practice this level is rare in production for high-stakes decisions, and it is probably undesirable in regulated contexts.
Article 14 of the EU AI Act requires effective human oversight of any high-risk AI. Strict level 5 is mechanically incompatible with that. The only zones where level 5 makes sense are processes with very low individual stakes but very high volume (basic content moderation, automatic image triage), with statistical guardrails.
Diagnosis: where your project stands today
Three metrics let you diagnose the effective level of an agent in production. The human escalation rate, which measures how many decisions are escalated: at stable level 3, you are typically between 5 and 15%. The reversion rate, which measures how many decisions are cancelled by a human after the fact: it must stay below 1% for level 3 to be defensible. The mean confidence score, which indicates calibration: an agent confident on 80% of its decisions that escalates the remaining 20% is healthy; an agent confident everywhere is suspicious.
Project progression is not linear. Many organizations stay six months at level 2 to calibrate thresholds on real data, then switch to level 3 overnight when metrics are stable. That is the best practice: deploy in level 2, measure, switch to level 3 when objective confidence is there.
Sujets abordés
- Agents IA
- Autonomie
- Sheridan
- Architecture
- Gouvernance
À approfondir dans le glossaire
How Swoft turns this challenge into software
L'architecture Swoft est conçue pour faire opérer les agents au degré 3 par défaut, avec extension possible au degré 4. Voici comment chaque exigence du passage de palier est portée par construction.
- 01
Périmètre formel non contournable
Chaque agent est rattaché à un Bounded Context du métamodèle DDD. Il ne peut techniquement pas écrire dans un domaine qui ne lui est pas attribué. Cette contrainte est vérifiée à la compilation et au runtime, pas par revue manuelle.
- 02
Mémoire structurée auditable
Chaque observation et chaque décision sont des événements typés persistés dans l'Event Store. Le rejeu de l'historique reproduit exactement le même résultat, indépendamment du modèle utilisé.
- 03
Calibration explicite de la confiance
Chaque décision IA porte un score de confiance, un raisonnement structuré, et la liste des alternatives considérées. Les seuils sont configurables par cas d'usage et modifiables à chaud.
- 04
Approval gates dynamiques
Les sagas event-sourcées injectent des points de validation humaine dynamiquement selon des règles métier. Une décision sensible suspend la saga, attend votre OK, et reprend automatiquement à la validation.
Continuer la lecture — SaaS
NIS2 for SaaS vendors: six months to pass the audit NIS2 for SaaS vendors: six months to pass the audit
Applicable since October 2024, the NIS2 directive starts to bite in 2026. SaaS vendors classified as "important entities" face new technical obligations.
EU AI Act articles 8-15: AI SaaS vendors must organize before August 2026 EU AI Act articles 8-15: AI SaaS vendors must organize before August 2026
On 2 August 2026, transparency and governance obligations for high-risk AI become applicable. For SaaS vendors, it's an underestimated workload.