“Agentic” is the word of the year, and like most words of the year, it’s used to mean five different things. For SaaS product teams, the useful framing isn’t “is it agentic or not” — it’s “how much autonomy are we handing the software, and what guardrails does that level require?”
Here’s the maturity ladder we use to think about agentic features, and the honest assessment of where production-safe SaaS actually lives today.

L0 — Manual
The user does every step. The software is a tool: forms, buttons, screens. No AI involved. This is where most software still lives, and for many workflows it’s exactly right.
L1 — Assisted
AI suggests; the user approves each action. “Here’s a draft reply— send it?” “Here are the 3 line items I’d categorize this way — confirm?” The user is in the loop on every action. This is the safest place to add AI, and where most successful AI features start.
The win at L1: the AI does the work, the human keeps the judgment. The user feels faster, not replaced. Trust builds because they see every action before it happens.
L2 — Supervised agent
The AI executes a multi-step plan, pausing at checkpoints for human review. “I’ll research these 10 leads, draft outreach for each, and show you before sending.” The agent does a chunk of autonomous work, then surfaces it for approval at a meaningful boundary.
This is where the frontier of production-safe SaaS sits in 2026. The agent is genuinely doing work, but a human reviews at the points that matter (before money moves, before something is sent externally, before data is deleted).
L3 — Autonomous agent
The AI acts on goals with no per-action human approval; the human audits after the fact. “Keep my calendar optimized” and it just does, rescheduling meetings as needed. “Handle tier-1 support” and it resolves tickets without asking.
L3 is real for narrow, low-stakes, reversible domains. It is NOT yet safe for high-stakes, irreversible actions (moving money, sending legally-binding communications, deleting data) without extraordinary guardrails. The “agent did something terrible autonomously” incidents of the last year all share a root cause: L3 autonomy applied to an L1-appropriate risk level.
The guardrails each level needs
- L1:clear “this was AI-generated” labeling; easy reject/edit.
- L2: checkpoints at every consequential boundary; a full preview of what the agent will do before it does it; the ability to interrupt.
- L3: a complete audit trail of every action; hard limits (spending caps, rate limits, scope restrictions); reversibility wherever possible; anomaly detection that escalates to a human; a kill switch.
How to choose the level
Match autonomy to the cost of being wrong:
- Reversible + low-stakes (drafting, categorizing, suggesting) → L2/L3 is fine.
- Irreversible OR high-stakes (payments, external comms, deletions) → keep a human in the loop (L1/L2) until you have years of evidence.
The mistake is choosing the autonomy level by how impressive it sounds, rather than by the blast radius of a mistake.
Design the audit trail first
Before you ship any agent above L1, build the audit trail. Every action the agent takes, with the reasoning, the inputs, and the result, logged immutably. When something goes wrong (it will), the audit trail is the difference between “we found and fixed it in an hour” and “we have no idea what it did.”
How we approach this
For agentic features we build via AI Software Development, we start at L1, earn trust, and move up the ladder only as the evidence supports it. The audit trail and guardrails go in before the autonomy does — never after.
Takeaways
- Four levels: manual, assisted, supervised agent, autonomous agent.
- Production-safe SaaS mostly lives at L1-L2 today.
- Match autonomy to the cost of being wrong, not to how impressive it sounds.
- Build the audit trail and guardrails before the autonomy.







