Skip to main content
SaaS

Is the term "AI agent" overused in 2026? A critical audit

ChatGPT wrappers, no-code platforms, code assistants, orchestration frameworks: everything labels itself as an AI agent. Confronting the academic definitions with real market products.

Kevin GibaudCo-fondateur Swoft
Tableau de comparaison de produits agents IA

In 2024-2026, the tech market flipped. Everything became an agent. SaaS vendors add an agent to their product. No-code platforms rebrand as agentic platforms. Startups raise on slides where the word appears once per bullet. And users no longer know how to distinguish what really deserves the label from what merely opportunistically wears it. This article offers a methodical audit: take the main product categories that claim to be AI agents, and confront them with the four properties laid down by Wooldridge in 1995.

Recap of the four criteria

For a system to deserve the name agent in the strong sense, it must combine four properties: autonomy (it decides without direct human intervention), reactivity (it perceives and reacts to its environment), pro-activeness (it takes initiative to reach its goal), social ability (it interacts with other agents). If a single one of these is missing, the system may be very useful, but it is not an agent in the academic sense.

Category 1: LLM wrappers (custom GPTs, long prompts)

This is the broadest and most naive category. A custom GPT, a long system prompt, a personalized Claude assistant: all work on the same pattern. The user types a query, the LLM answers following the instructions in its system prompt. It can possibly call a few tools (web search, code execution) to enrich its answer.

Let's check. Autonomy: no, every turn requires a new user query. Reactivity: limited, the system only reacts to what the user sends, not to changes in an external environment. Pro-activeness: absent, it never takes initiative. Social ability: limited to the single user querying it. Verdict: these are conversational assistants, not agents.

Category 2: workflow platforms (Make, n8n, Zapier)

With the rise of AI, these platforms introduced AI nodes and now position themselves on the agent market. A typical workflow: a trigger (new email), an AI node analyzing the content, a decision node, action nodes (create a record, send a Slack message). The marketing pitch readily talks about agents that automate your tasks.

Let's check. Autonomy: partial, the workflow runs without a human once triggered, but the autonomy scope is narrow (the workflow branches). Reactivity: yes, reacts to triggers. Pro-activeness: no, never self-triggers outside a defined trigger. Social ability: very limited, no communication between workflow instances. Verdict: these are sophisticated automations with an AI layer. The term agent is used for marketing.

Category 3: code assistants (Cursor, Cline, Aider, Claude Code)

These are the most advanced consumer products on the market. Cursor in composer mode, Cline, Aider in autopilot, Claude Code: all can receive a high-level instruction ("implement this feature"), explore the code, modify multiple files, run tests, iterate on errors, and deliver a usable result.

Let's check. Autonomy: yes during the session, the code assistant takes technical decisions without asking at each step. Reactivity: yes, it reacts to compilation errors and test failures. Pro-activeness: partly yes, it can decide to add a dependency, refactor, without the human explicitly asking. Social ability: low, no structured communication with other agents.

Verdict: yes, these are agents in the Wooldridge sense. But single-task agents, with no persistence, no organization. The session ends, the agent disappears with it. And above all: their autonomy has no architectural bound. They can in theory modify any file, run any command. That is precisely what makes them powerful, and risky in production.

Category 4: orchestration frameworks (LangChain, CrewAI, AutoGen, LangGraph)

These frameworks propose to build multi-agent systems by assembling LLMs that talk to each other, hand off tasks, vote, debate. CrewAI introduces the notion of a crew (a manager, workers). AutoGen offers conversations between agents with optional human-in-the-loop. LangGraph models everything as a state graph.

Let's check. Autonomy: yes, each agent in the system makes decisions. Reactivity: yes. Pro-activeness: yes. Social ability: yes, that's the central point. Verdict: these are agents in the Wooldridge sense. But — and it's a big but — they are not multi-agent systems in Ferber's sense. Why? Because they lack the organizational dimension.

Category 5: commercial platforms (Salesforce Agentforce, Microsoft Copilot Agents)

Salesforce Agentforce, launched in 2024, offers preconfigured agents for sales tasks: lead qualification, meeting scheduling, customer support. Microsoft followed with Copilot Agents integrated in Office and Dynamics. These platforms promise off-the-shelf agents, configurable without code, deployable in a few clicks.

Let's check. Autonomy: yes within a narrow scope, the agent can send an email, create an opportunity, schedule a call. Reactivity: yes. Pro-activeness: limited, the agent generally waits for triggers (incoming new lead, customer request). Social ability: low, little structured inter-agent interaction. Verdict: technically Wooldridge agents, but at very low depth. The marketing promise often outruns the functional reality.

The simple test that settles it

Beyond these academic criteria, there is a pragmatic test that captures everything: "Can the agent refuse to answer a human in order to finish its current task?" This question captures the four Wooldridge properties in one. An autonomous agent has an assigned task. A reactive agent perceives that a human is querying it. A pro-active agent decides based on its goal. A socially competent agent negotiates its answer with the human ("I'll finish this first, I'll come back to you").

Apply this test to the five categories. LLM wrapper: no, it always answers. Workflow platform: no, doesn't carry a persistent task. Code assistant: sometimes, it can refuse a user change if it breaks a test. Orchestration framework: technically yes, but the human remains the arbitration layer. Commercial platform: no, the agent always prioritizes the human. Conclusion: barely 5% of products called AI agents in 2026 pass this test.

Conclusion: overused, yes, but not unusable

The term AI agent is overused. An overwhelming share of products that claim it are not, in the academic sense. That doesn't mean these products are bad — many are excellent for their use case. But they are not agents. They are assistants, automations, wrappers, orchestrators.

What to do? Two options. Either accept that the word has lost its meaning and use it as a synonym for "AI that does something," which is commercially convenient but intellectually empty. Or systematically state which level of autonomy you're talking about, and keep Wooldridge's rigour for the cases where it really matters — that is, in critical production, on consequential decisions, in regulated sectors.

It is the second path that we recommend. And it is also why we published a mapping of the five autonomy degrees possible for an AI agent, which lets you precisely position a product on the spectrum from LLM wrapper to full Wooldridge system.

Sujets abordés

  • Agent IA
  • Wooldridge
  • Ferber
  • Audit critique
  • LangChain
  • CrewAI
  • AutoGen
  • LangGraph
  • Cursor
  • Salesforce Agentforce
  • Définition rigoureuse

À approfondir dans le glossaire

Tech translation

How Swoft turns this challenge into software

Comment Swoft applique le test de Wooldridge à ses propres agents, et pourquoi il refuse de qualifier d'agents les composants qui n'en sont pas.

  1. 01

    Pas de wrapper LLM appelé agent

    Les composants Swoft qui font de l'inférence LLM sans satisfaire les quatre propriétés Wooldridge ne sont jamais nommés agents. On les appelle services LLM, classifieurs, ou extracteurs.

  2. 02

    13 agents nominatifs, pas plus

    Notre plateforme expose 13 agents persona, chacun rattaché à un bounded context, avec son périmètre, ses outils, ses règles d'escalation. Les autres composants IA, plus modestes en autonomie, ne s'appellent pas agents.

  3. 03

    Test de Ferber appliqué

    Pour chacun des 13 agents, la question « qui décide en cas de conflit ? » a une réponse structurelle : la table de routage Conway et les seuils de confiance. C'est ce qui rend le système réellement multi-agents au sens Ferber.

  4. 04

    Refus structurel possible

    Un agent Swoft peut refuser une commande humaine si elle viole une contrainte de son bounded context. Pas par politesse, par architecture. C'est précisément ce que le test « peut-il refuser ? » exige.

Continuer la lecture — SaaS

  • NIS2 for SaaS vendors: six months to pass the audit
    Salle serveur d'un éditeur SaaS avec consoles de supervision sécurité

    NIS2 for SaaS vendors: six months to pass the audit

    Applicable since October 2024, the NIS2 directive starts to bite in 2026. SaaS vendors classified as "important entities" face new technical obligations.