Skip to main content
SaaS

Agentic Engineering: The Guide After Vibe Coding (With Real Numbers)

Karpathy says vibe coding is over. Here's what agentic engineering actually means in practice, with the cost numbers and architecture choices that prove it.

Kevin GibaudCo-fondateur, Product & Design — Swoft
Carte électronique aux pistes de cuivre illuminées, métaphore de la précision architecturale de l'agentic engineering

In May 2026, at Sequoia's AI Ascent conference, Andrej Karpathy drew a line. Vibe coding — the practice of describing what you want to an LLM and shipping whatever comes out — was the warmup act. The main event is something harder, more disciplined, and far more consequential: agentic engineering.

That distinction matters because the tooling industry has blurred it aggressively. Every IDE plugin and every no-code platform now claims to be "agentic." Very few of them are. Understanding the difference is what separates teams that ship production software from teams that accumulate prototypes.

What Karpathy Actually Said

Karpathy's bearblog post summarizing the Sequoia talk is precise enough to quote directly. He described a personal inflection point: in November 2025 he was writing roughly 80% of his own code. By December, that ratio had inverted — he was delegating 80% to agents. The shift happened not because the tools became smarter overnight, but because they crossed a reliability threshold where the cognitive overhead of supervising an agent became lower than the overhead of writing the code yourself.

The goal is to capture the leverage of agents without any compromise on the quality of the software. Vibe coding does the opposite: it maximizes leverage by sacrificing quality.

Andrej Karpathy, Sequoia AI Ascent 2026 (paraphrased from bearblog summary)

The operative word is "fallible." Agents hallucinate. They propose architecturally unsound shortcuts. They match email addresses from two different identity sources and call it a fix. An experienced engineer sees the boundary violation immediately. A vibe coder ships it. That gap — between recognizing a correct solution and recognizing a plausible-looking wrong one — is where agentic engineering lives.

Vibe Coding vs. Agentic Engineering: A Clean Distinction

Martin Fowler put it plainly in his September 2025 piece "To vibe or not to vibe": vibe coding is writing code where you pay no attention to the code at all. Agentic engineering is the other end of the scale — professional software engineers using coding agents to amplify their existing expertise, not replace their judgment.

  • Vibe coding: describe intent, accept output, ship. Review is optional. Works for throwaway scripts and personal tools.
  • Agentic engineering: design specs first, supervise agent plans, inspect every diff, write evaluation loops, manage permissions, isolate worktrees, preserve architecture.
  • Vibe coding democratizes software creation. Agentic engineering professionalizes AI-assisted development.
  • The output of vibe coding is a prototype. The output of agentic engineering is a system.

IBM's definition aligns: agentic engineering treats AI-generated code as production-grade software, which means it must be tested, governed, and maintainable by people who did not write it. That is a completely different contract than "it runs on my machine."

Why This Matters Now: The Cost Proof

The theoretical argument is clean. The practical argument is harder to dismiss. Traditional custom software development for a complex application costs north of €85,000. Using Claude Code alone with a disciplined developer, that same scope runs above €20,000. With agentic engineering — meaning agents orchestrated under a structured architecture, not just a developer chatting with an IDE — we deliver the equivalent for €2,900.

That is not a rounding error. It is a structural difference in how the work is organized. A simple application that takes seven days with a traditional team, or three hours with Claude Code alone, takes one hour under agentic engineering. The acceleration does not come from the LLM being smarter. It comes from every layer of the system doing only what it is suited for: domain logic encoded symbolically, agent coordination handled structurally, business rules verified rather than guessed.

The Seven Characteristics of Agentic-Engineered Software

Calling something "agentic" does not make it so. Here are the seven properties that distinguish software built under agentic engineering from software assembled through prompt-and-paste:

  1. Spec-first design. Agents receive detailed specifications, not vague instructions. The spec is the primary artifact; the code is its derivative.
  2. Bounded contexts with explicit contracts. Domain boundaries are defined before agents touch a line of code. An agent cannot accidentally couple two contexts because the contract between them is enforced structurally.
  3. Event-sourced state. Instead of mutable shared state that agents can corrupt unpredictably, the system records intent as immutable events. Any agent action is a command producing events, not a direct database write.
  4. Evaluation loops, not manual review. Every agent output passes through automated evaluation: type checks, domain invariant validation, integration tests, and behavioral assertions.
  5. Permission scoping and worktree isolation. Agents operate with the minimum permissions needed for a task, in isolated working contexts. A rogue suggestion cannot propagate across the codebase.
  6. Architectural taste as a non-negotiable constraint. The engineer's job is not to review code line by line but to define what correct architecture looks like and reject anything that violates it — before agents generate it.
  7. Observability baked in from the start. Every agent decision that affects state is logged, attributed, and replayable. Debugging a system built by agents requires the same rigor as debugging a distributed system.

The Role of Domain-Driven Design

Domain-Driven Design was never primarily about code organization. It was about making the complexity of a problem legible enough that software could model it faithfully. That goal becomes more urgent, not less, when agents are writing the code.

An agent given a poorly bounded domain will produce software that works in demos and fails in edge cases. It will mix concerns, duplicate logic, and create implicit dependencies that only surface under load. DDD gives agentic engineers the vocabulary to prevent this: ubiquitous language, bounded contexts, aggregates, and domain events are not decorative — they are the constraints that keep agent output inside the problem's actual shape.

Event sourcing compounds this benefit. When every state change is an immutable event tied to a domain command, agents cannot silently corrupt state. A failed agent action is a failed command, visible in the event log, replayable in a test. This is the architectural property that makes AI-assisted development safe at scale rather than chaotic.

Neurosymbolic Engineering: The Academic Foundation

In May 2025, Antonio Mastropaolo and Denys Poshyvanyk published a position paper at the Foundations of Software Engineering (FSE 2025) titled "A Path Less Traveled: Reimagining Software Engineering Automation via a Neurosymbolic Paradigm." Its core argument is worth understanding for anyone building production systems with AI assistance.

Large code models achieve impressive benchmark results, but they do so through statistical pattern matching at massive scale. The result is a system with high average performance and unpredictable failure modes. Mastropaolo and Poshyvanyk propose combining neural learning — the LLM generating code — with symbolic reasoning — explicit rules, invariants, and structural constraints — and deliberately introducing controlled randomness to model the non-determinism of real-world software environments.

The neurosymbolic paradigm aims to create systems that are more adaptable, transparent, and closely aligned with the evolving demands of modern software development practices — not by replacing the engineer's judgment, but by encoding it structurally.

Mastropaolo & Poshyvanyk, A Path Less Traveled, FSE Companion 2025 (paraphrased)

In practical terms, this is what event sourcing and DDD deliver inside an agentic workflow: the symbolic layer. Domain invariants are rules the system must enforce regardless of what an agent suggests. The ubiquitous language is a controlled vocabulary that constrains how agents interpret requirements. The event log is the transparent record that makes agent behavior auditable. The "neural" layer — the LLM — operates inside a symbolic harness that catches its failure modes before they reach production.

MCP: The Protocol That Makes Agent Coordination Tractable

One of the practical gaps in most agentic workflows is coordination: how does one agent pass context to another without corrupting it? How does an agent access a database, a codebase, and an external API within a single task without requiring ad hoc scripting for each integration?

The Model Context Protocol (MCP) solves this by providing a standardized contract between an agent and the tools it can use. Instead of bespoke integrations, every tool exposes the same interface: capabilities, permissions, inputs, outputs. An agent can be granted access to a specific MCP server for a specific task and nothing else. This is the technical implementation of the permission scoping principle described above.

For agentic engineers, MCP changes the unit of work. Instead of thinking in terms of code to write, you think in terms of capabilities to grant, contexts to provide, and boundaries to enforce. That is a significant cognitive shift — and it is the shift that separates practitioners who understand the paradigm from those who are still using AI as an autocomplete tool.

From Vibe Coding to Production: The Four Failure Modes

Teams that start with vibe coding and try to graduate to production software without rearchitecting their process hit the same four walls reliably.

  • State corruption. Without event sourcing or equivalent constraints, agents make plausible writes that violate business invariants. The bug surfaces in production, weeks after the code was written, in a context no one recognizes.
  • Context drift. As a codebase grows, agents lose track of decisions made in earlier sessions. They propose solutions that contradict established domain boundaries. The team spends more time correcting agents than the agents save.
  • Test debt. Vibe-coded projects almost never have meaningful test coverage, because the workflow does not require it. When agents produce evaluation loops as part of the spec-first process, this debt never accumulates.
  • Unmaintainable outputs. Code that works is not the same as code that can be changed six months later by someone who did not write it. Agentic engineering enforces the architectural constraints that make software maintainable across agent sessions, developers, and time.

The Engineer's Role Has Not Shrunk — It Has Changed

A common misreading of the agentic engineering thesis is that the engineer becomes a passive supervisor, occasionally approving or rejecting agent output. Karpathy's actual description is the opposite: the agentic engineer is doing more complex cognitive work than before, just at a higher level of abstraction.

Designing a specification that an agent can execute correctly is harder than writing the code yourself if you do not understand the domain deeply. Crafting evaluation loops that catch domain-level errors requires more than testing instinct — it requires knowing what "correct" looks like for the problem at hand. Managing permissions and worktree isolation requires understanding security boundaries at the system level, not the function level.

What changes is what good engineers spend their time on. Writing boilerplate, scaffolding CRUD endpoints, wiring configuration — these tasks move to agents. Defining domain models, designing evaluation strategies, deciding where boundaries go, recognizing architecturally unsound proposals — these tasks belong to the engineer, and they are harder to do well than the tasks they replace.

Measuring Agentic Engineering: What "Done" Looks Like

In the absence of line count or time-per-feature, teams new to agentic engineering often struggle to measure whether they are doing it well. The indicators that matter are not the ones borrowed from traditional development.

  • Spec fidelity: does the delivered system match the domain model that was specified, including edge cases and invariants?
  • Evaluation coverage: do the evaluation loops catch domain-level errors, not just syntax errors and type mismatches?
  • Drift resistance: can a new agent session resume work on the codebase without violating decisions made in previous sessions?
  • Change cost: does modifying a business rule require touching only the expected parts of the system, or does it cascade unpredictably?
  • Auditability: can any state in the system be explained by tracing back through the event log to the original command?

These metrics are architectural properties, not sprint metrics. They are the same properties that distinguish good software from legacy systems — which is exactly the point. Agentic engineering does not lower the bar for production software quality. It reaches that bar faster and at lower cost.

The Practical Transition: Where to Start

For teams currently using AI-assisted coding tools as sophisticated autocomplete — writing code with Cursor, Copilot, or Claude — the transition to agentic engineering is less a tooling change than a process change. The tools are already capable. The process is what needs rebuilding.

  1. Start with domain modeling before any code. Write the ubiquitous language, identify bounded contexts, define the commands and events for your core domain. This work cannot be delegated to agents — it is the input they need to be useful.
  2. Replace ad hoc agent sessions with structured task specs. Each task an agent receives should include: context (what already exists), constraints (what must not change), definition of done (what success looks like), and the evaluation that will verify it.
  3. Build evaluation loops before feature loops. The first artifact of any new domain area is the test harness, not the feature. Agents should not ship code that has no evaluation path.
  4. Adopt MCP for every tool integration. Bespoke integrations create bespoke failure modes. Standard interfaces create standard auditability.
  5. Treat architectural review as the primary engineering activity. Code review moves faster when agents write the code; architectural review becomes the bottleneck that matters.

Vibe Coding Belongs to 2025. Here Is What 2030 Looks Like.

The trajectory from here is not more powerful prompts producing better code. It is increasingly capable agents operating inside increasingly precise constraints — domain models, symbolic invariants, evaluation harnesses, permission scopes. The engineer who thrives in that environment is not the one who writes the least code. It is the one who defines the most precise problem.

Karpathy's framing — that the best engineers will not be the ones who write every line, but the ones who can direct agents without letting quality collapse — is not a prediction about a distant future. It describes what is possible today, for teams willing to do the foundational work that vibe coding skips. The numbers are not theoretical. Complex software for €2,900. A working application in one hour. These are not marketing claims — they are the output of treating agentic engineering as a discipline rather than a feature.

Sources et lectures complémentaires

  1. [1]Andrej Karpathy — Sequoia Ascent 2026 bearblog summaryPrimary source for the vibe coding / agentic engineering distinction and the 80% delegation inflection point.
  2. [2]Mastropaolo & Poshyvanyk — A Path Less Traveled: Reimagining Software Engineering Automation via a Neurosymbolic Paradigm (arXiv:2505.02275, FSE Companion 2025)Peer-reviewed foundation for the neurosymbolic approach to SE automation.
  3. [3]Martin Fowler — To vibe or not to vibe (martinfowler.com, September 2025)Definitive practitioner distinction between vibe coding and professional agentic development.
  4. [4]IBM Think — What is Agentic Engineering?Enterprise definition and the distinction between agentic coding and agentic engineering.
  5. [5]Andrej Karpathy — YouTube: Sequoia AI Ascent 2026 talkFull video of the Sequoia AI Ascent 2026 talk.

Sujets abordés

  • Agentic engineering
  • Vibe coding
  • Architecture logicielle
  • Agents IA
  • Domain-Driven Design
  • Event sourcing
  • Neurosymbolisme
Tech translation

How Swoft turns this challenge into software

Les quatre piliers de l'approche Swoft traduisent directement les exigences de l'agentic engineering en capacités concrètes, livrables sur un périmètre métier réel.

  1. 01

    Architecture qui part du métier (DDD spec-first)

    Chaque système est conçu à partir des périmètres métier bien délimités et des invariants du domaine, formalisés avant le premier commit. Les agents opèrent dans ces périmètres et ne peuvent techniquement pas écrire hors de leurs frontières.

  2. 02

    Chaque changement enregistré comme un événement daté

    Le journal d'événements daté est le mécanisme de mémoire des agents. Chaque décision est un événement typé, persisté, rejouable. En cas d'anomalie, la correction est chirurgicale : on rejoue la séquence fautive, on corrige l'événement, et l'état se reconstruit proprement.

  3. 03

    Protocole standard qui permet aux IA de parler aux outils (orchestration MCP)

    Toutes les intégrations externes — APIs métier, bases de données, services tiers — passent par des outils MCP avec schéma formel. Chaque appel est observable, testable et rejouable, sans mocking approximatif.

  4. 04

    Apprentissage neuronal combiné au raisonnement symbolique (boucles d'évaluation neurosymboliques)

    Les agents sont couplés à des moteurs de règles formelles issus du modèle de domaine. Ils ne peuvent pas produire de résultat qui viole un invariant. Des suites d'évaluation automatisées mesurent la fidélité aux spécifications à chaque évolution.

Questions fréquentes

À retenir sur ce sujet

Qu'est-ce que l'agentic engineering ?
L'agentic engineering est une approche de développement logiciel dans laquelle les agents IA opèrent à l'intérieur d'un cadre formel issu de l'ingénierie logicielle — modèle de domaine, invariants, journal d'événements daté, protocoles d'escalade. Par opposition au vibe coding, chaque décision d'un agent est traçable, auditable et vérifiable par rapport aux spécifications du domaine métier.
Quelle différence entre vibe coding et agentic engineering ?
Le vibe coding délègue les décisions de conception au modèle de langage sans représentation explicite du domaine. L'agentic engineering donne aux agents un cadre formel à l'intérieur duquel ils opèrent. La différence pratique se mesure au coût du changement : dans un système vibe-coded, ce coût croît avec le temps ; dans un système agentic-engineered, il reste stable.
Qu'apporte l'ingénierie logicielle neurosymbolique ?
L'ingénierie logicielle neurosymbolique combine un modèle de langage (le raisonnement neuronal) et un moteur de règles formelles issu du modèle de domaine (le raisonnement symbolique). Le résultat est un agent qui ne peut pas produire de décision qui viole un invariant métier. Mastropaolo et Poshyvanyk ont défendu cette approche en 2025 (FSE Companion) comme une voie pour améliorer la fidélité aux spécifications sur des tâches d'ingénierie structurées.
Une équipe peut-elle passer du vibe coding à l'agentic engineering ?
Oui, en cinq étapes itératives : cartographier le domaine, extraire le modèle du code existant, introduire le journal d'événements daté sur les flux critiques, câbler MCP sur les intégrations externes, et construire la suite d'évaluation. La transition ne nécessite pas de tout refondre en une fois. On peut commencer sur un périmètre métier limité et étendre progressivement.
Quelle différence de coût réelle entre vibe coding et agentic engineering pour du logiciel de production ?
Pour un même périmètre de logiciel complexe, les ordres de grandeur observés sont tranchés : plus de 85 000 € en développement traditionnel, autour de 20 000 € avec un développeur discipliné utilisant Claude Code seul, à partir de 2 900 € en agentic engineering. Sur une application simple, on passe de 7 jours classiques à 3 heures avec Claude Code seul, et à 1 heure quand les agents opèrent dans une architecture structurée. Le delta vient du coût marginal de production : l'architecture qui part du métier et le journal d'événements daté empêchent la dette technique de se ré-accumuler à chaque itération.

Continuer la lecture — SaaS

  • NIS2 for SaaS vendors: six months to pass the audit
    Salle serveur d'un éditeur SaaS avec consoles de supervision sécurité

    NIS2 for SaaS vendors: six months to pass the audit

    Applicable since October 2024, the NIS2 directive starts to bite in 2026. SaaS vendors classified as "important entities" face new technical obligations.