Agent Maturity
Understand the AMC-5 maturity levels and certification axes covering performance, security, and functionality.
Agentic Maturity and Certification (AMC)
Purpose. Provide a unified way to see an agent’s autonomy level and trust its readiness:
- Maturity (L1→L5): how agentic it is, how much autonomy and agency.
- Certification (Performance / Security / Functional): how secure, effective, and interoperable it is.
- Technology Readiness Level (TRL1→TRL3): assessment of the market readiness of the agents for deployment.
Maturity Levels
We adopt a framework for agentic maturity, from Level 1 to Level 5, with L5 being the highest maturity. We call this the AM-5 framework - which draws inspiration from seminal work on agentic autonomy like Levels of Autonomy for AI Agents, the Agentic Identity Maturity Model and the Agentic Enterprise Maturity Model.
Hinge: From L3+, agents start to work as teams with greater autonomy.
Agent Maturity Questionaire
To determine the AM-5 maturity level for agents in the gallery, we apply a questionaire to each individual agent based on its metadata. This questionaire is then executed by a frontier model, and composed to form a rubric for an LLM-as-a-judge classification.
Specifically the questions probe the nature of user interaction, agent autonomy, and the scope of work to accurately place an agent's capabilities within the AM-5 framework.
Level 1: User as Operator, Agents as Task Executors
This level is defined by direct command and single-purpose execution. The agent is a simple tool.
- Interaction: Does the user provide an explicit, step-by-step instruction for a single, predefined task (e.g., "summarize this text," "translate this sentence")?
- Autonomy: Does the agent operate without any independent decision-making, strictly following a fixed script or a rigid set of rules?
- Scope: Is the agent's functionality limited to a narrow, singular purpose, unable to chain commands or handle ambiguity?
Level 2: User as Collaborator, Agents as Trustworthy Assistants
This level involves delegating a complete but contained task that may require multiple steps. The agent is a reliable helper.
- Interaction: Can the user delegate a complete, but well-defined, multi-step task (e.g., "research topic X and comparing the outputs given by Gemini, Claude, OpenAI and Perplexity") and expect a finished product?
- Autonomy: Does the agent require human supervision or intervention to handle exceptions, confirm intermediate steps, or resolve ambiguities?
- Scope: Can the agent reliably execute a sequence of related actions to fulfill a single, delegated outcome?
Level 3: User as Consultant, Agents as Team Players
This is the hinge level where multiple agents begin working together. The focus is on coordination and workflows.
- Interaction: Does the user provide a goal to a system of agents, which then coordinate amongst themselves to achieve it, with the user acting as a consultant?
- Autonomy & Collaboration: Do multiple agents communicate and pass information between each other to complete a workflow without direct, step-by-step user management?
- Scope & Complexity: Does the system involve specialized agents with distinct roles or "identities" that contribute to a larger, orchestrated process (e.g. agents within a super, utility agent hierachy each with defined tasks like a Market Research Utility Agent reporting to a Sourcing Intelligence Super Agent)?
Level 4: User as Approver, Agents as Goal-Driven Professionals
At this level, agents become proactive planners. The user's primary role shifts from director to approver.
- Interaction: Is the user's main role to approve or reject a multi-step plan that was autonomously generated by the agent(s) in response to a high-level goal?
- Autonomy: Does the agent independently devise its own strategy, create novel sub-tasks, and sequence actions to achieve the user's abstract objective?
- Scope & Adaptability: Can the agent incorporate feedback to dynamically refine its plan before or during execution, demonstrating a professional, goal-driven approach?
Level 5: User as Observer, Agents as Autonomous Workers
This is the highest level of maturity, where agents function like self-sufficient digital employees.
- Interaction: Does the user primarily observe the agent's performance against high-level, long-running objectives, rather than managing individual tasks or plans?
- Autonomy: Does the agent operate with full autonomy, managing its own resources, workflows, and long-term strategy to achieve business-level outcomes?
- Scope & Adaptability: Does the agent proactively optimize its own performance and strategies over time based on outcomes, without requiring explicit human guidance for improvement?
Agentic Certification Mapping
Next, we can measure the L1 to L5 agentic maturity alongside certification axes. Putting them together gives us an AMC-5 framework, of Agentic Maturity and Certification - which places besides each maturity level, a thinking of how the other axes of agent certification, performance, security and functional would work.
| Level | Performance Certification | Security Certification | Functionality / Interoperability |
|---|---|---|---|
| L1 Narrow Task | Correctness metrics are prioritized. Human plays a major role in ensuring robustness and reliability (e.g., copilot). | Human-centric interaction, audit, and moderate security required. | Not required. |
| L2 Multi-Task | Correctness metrics are prioritized. Human plays a major role in ensuring robustness and reliability (e.g., copilot). | Human-centric interaction, audit, and moderate security required. | Basic – works with orchestrator / models. |
| L3 Pre-Coordinated | All correctness and robustness metrics are prioritized as agents run autonomously and for more turns. | Semi-autonomous, moderate security and audit required. | Moderate – works with other agents. |
| L4 Dynamic Planning | All correctness and robustness metrics are prioritized as agents run autonomously and for more turns. | Semi-autonomous, moderate security and audit required. | Strong – composable. |
| L5 Fully Autonomous | All correctness and robustness metrics are prioritized as agents run autonomously and for more turns. | Fully autonomous — strong security controls and audit required. | Excellent – well-defined and highly composable. |
Technology Readiness Level (TRL)
The Technology Readiness Level (TRL) framework gauges how close an agent is to real-world deployment, focusing on its technical robustness, operational environment, and regulatory readiness. Our 7-tier adaptation tracks the progression from initial concepting through highly regulated production usage.
| Level | Tier | Description | Typical Activities / Deployment Context |
|---|---|---|---|
| Level 1 | Concept & Roadmap Tier | Agent requirements and functionalities conceptualized and validated for business outcomes. | Process re-engineering, functional due diligence. |
| Level 2 | Prototype Tier | Working standalone agent prototypes validated for intended functionality and outcomes with mostly synthetic data. | Simple internal demos, proof-of-concept builds. |
| Level 3 | Trusted Agent Huddle Tier | Working agent prototypes onboarded to AI refinery and enabled for trusted agent huddle orchestration across systems. | Hackathons, cross-functional demos, proof-of-concepts, citizen / power-user experiments. |
| Level 4 | Pilot Tier | Working agents with full functionality and limited robustness in a real-life workflow tuned to perform with production data. | COE teams, production rollout for pilot user groups and scale to get feedback and improve. |
| Level 5 | Production Tier - General Usage | Working agents that are deployed and operating in general purpose internal facing scaled production usage. | Startups, less-regulated internal business functions or production pilots. |
| Level 6 | Production Tier - Customer Usage | Working agents with high functionality, performance, tuning, controls for customer facing scaled production usage. | Customer facing business functions or production pilots. |
| Level 7 | Production Tier - Regulatory Usage | Working agents with high functionality, performance, tuning, regulatory level controls for customer facing scaled production usage. | Highly regulated financial, healthcare, federal production systems. |
In essence: TRL reflects market and deployment readiness, complementing Maturity (how autonomous it is) and Certification (how trustworthy it is) to give a full picture of an agent’s lifecycle.
AMC-5 Certificate Card
Below is an example of an in-development AMC-5 report / certificate. The agent maturity is scaled via LLM-as-a-Judge execution of the AM-5 questionaire. Certification is detailed in the certification page.
| Category | Assessment based on L3 Certification |
|---|---|
| Maturity - Interaction | Yes — The user provides a goal in the investment research task to the "Investment Strategy Advisor". |
| Maturity - Autonomy | Yes — The multiple agents communicate and pass information between each other using a FlowSuperAgent. |
| Maturity - Scope | Yes — The agent has clear, specialized roles defined like "Stock Price Researcher" and "Financial Report Writer" within an agent hierarchy with "Investment Strategy Advisor" and is part of a coordinated process using a FlowSuperAgent. |
| Technology Readiness Level | Level 3 – Trusted Agent Huddle Tier — Built as an example for utilising the FlowSuperAgent within the AI Refinery and deployed within experimentation environments. |
| Certification - Performance | Pass — The agent meets the required standards for tool selection, invocation correctness, reliability, and latency. |
| Certification - Functionality | Excellent — The agent demonstrates high capability in interoperability, composability, documentation, and guardrails. |
| Certification - Security | A (Minimal Risk) — The agent has the highest security grade, assessed on factors like CIA impact and likelihood. |
Agentic Maturity & Certification (AMC-5)
This certifies the agent’s maturity level and the results across Performance, Security, and Functional readiness.
Agent prototypes orchestrated across systems within the AI refinery and trusted huddles.
Tool selection · Invocation correctness · Reliability · Latency
Impact × Likelihood on CIA · Identity · Audit · Zero-Trust
Interop (AIR/MCP/A2A) · Composability · Docs · Guardrails
Gallery AMC-5 Schema
This is an in-development schema for the AMC type of certification and maturity from the gallery.
{
"agent_id": "CAAI/Investment_Strategy_Advisor",
"maturity_level": 3,
"cert": {
"performance": "pass",
"security_grade": "A",
"functional": "excellent"
},
"telemetry": {
"tool_selection_acc": 0.94,
"valid_invocation_rate": 0.98,
"success_rate": 0.97,
"latency_p95_ms": 2100
},
"security": {
"impact": "medium",
"likelihood": "low",
"controls": ["scoped OAuth", "audit sink"]
}
}