Agent Gallery
Agent Gallery OperationsCertifying Agents

Agent Maturity

Understand the AMC-5 maturity levels and certification axes covering performance, security, and functionality.

Agentic Maturity and Certification (AMC)

Purpose. Provide a unified way to see an agent’s autonomy level and trust its readiness:

  • Maturity (L1→L5): how agentic it is, how much autonomy and agency.
  • Certification (Performance / Security / Functional): how secure, effective, and interoperable it is.
  • Technology Readiness Level (TRL1→TRL3): assessment of the market readiness of the agents for deployment.

Maturity Levels

We adopt a framework for agentic maturity, from Level 1 to Level 5, with L5 being the highest maturity. We call this the AM-5 framework - which draws inspiration from seminal work on agentic autonomy like Levels of Autonomy for AI Agents, the Agentic Identity Maturity Model and the Agentic Enterprise Maturity Model.

Hinge: From L3+, agents start to work as teams with greater autonomy.

Agent Maturity Questionaire

To determine the AM-5 maturity level for agents in the gallery, we apply a questionaire to each individual agent based on its metadata. This questionaire is then executed by a frontier model, and composed to form a rubric for an LLM-as-a-judge classification.

Specifically the questions probe the nature of user interaction, agent autonomy, and the scope of work to accurately place an agent's capabilities within the AM-5 framework.

Level 1: User as Operator, Agents as Task Executors

This level is defined by direct command and single-purpose execution. The agent is a simple tool.

  1. Interaction: Does the user provide an explicit, step-by-step instruction for a single, predefined task (e.g., "summarize this text," "translate this sentence")?
  2. Autonomy: Does the agent operate without any independent decision-making, strictly following a fixed script or a rigid set of rules?
  3. Scope: Is the agent's functionality limited to a narrow, singular purpose, unable to chain commands or handle ambiguity?

Level 2: User as Collaborator, Agents as Trustworthy Assistants

This level involves delegating a complete but contained task that may require multiple steps. The agent is a reliable helper.

  1. Interaction: Can the user delegate a complete, but well-defined, multi-step task (e.g., "research topic X and comparing the outputs given by Gemini, Claude, OpenAI and Perplexity") and expect a finished product?
  2. Autonomy: Does the agent require human supervision or intervention to handle exceptions, confirm intermediate steps, or resolve ambiguities?
  3. Scope: Can the agent reliably execute a sequence of related actions to fulfill a single, delegated outcome?

Level 3: User as Consultant, Agents as Team Players

This is the hinge level where multiple agents begin working together. The focus is on coordination and workflows.

  1. Interaction: Does the user provide a goal to a system of agents, which then coordinate amongst themselves to achieve it, with the user acting as a consultant?
  2. Autonomy & Collaboration: Do multiple agents communicate and pass information between each other to complete a workflow without direct, step-by-step user management?
  3. Scope & Complexity: Does the system involve specialized agents with distinct roles or "identities" that contribute to a larger, orchestrated process (e.g. agents within a super, utility agent hierachy each with defined tasks like a Market Research Utility Agent reporting to a Sourcing Intelligence Super Agent)?

Level 4: User as Approver, Agents as Goal-Driven Professionals

At this level, agents become proactive planners. The user's primary role shifts from director to approver.

  1. Interaction: Is the user's main role to approve or reject a multi-step plan that was autonomously generated by the agent(s) in response to a high-level goal?
  2. Autonomy: Does the agent independently devise its own strategy, create novel sub-tasks, and sequence actions to achieve the user's abstract objective?
  3. Scope & Adaptability: Can the agent incorporate feedback to dynamically refine its plan before or during execution, demonstrating a professional, goal-driven approach?

Level 5: User as Observer, Agents as Autonomous Workers

This is the highest level of maturity, where agents function like self-sufficient digital employees.

  1. Interaction: Does the user primarily observe the agent's performance against high-level, long-running objectives, rather than managing individual tasks or plans?
  2. Autonomy: Does the agent operate with full autonomy, managing its own resources, workflows, and long-term strategy to achieve business-level outcomes?
  3. Scope & Adaptability: Does the agent proactively optimize its own performance and strategies over time based on outcomes, without requiring explicit human guidance for improvement?

Agentic Certification Mapping

Next, we can measure the L1 to L5 agentic maturity alongside certification axes. Putting them together gives us an AMC-5 framework, of Agentic Maturity and Certification - which places besides each maturity level, a thinking of how the other axes of agent certification, performance, security and functional would work.

LevelPerformance CertificationSecurity CertificationFunctionality / Interoperability
L1 Narrow TaskCorrectness metrics are prioritized. Human plays a major role in ensuring robustness and reliability (e.g., copilot).Human-centric interaction, audit, and moderate security required.Not required.
L2 Multi-TaskCorrectness metrics are prioritized. Human plays a major role in ensuring robustness and reliability (e.g., copilot).Human-centric interaction, audit, and moderate security required.Basic – works with orchestrator / models.
L3 Pre-CoordinatedAll correctness and robustness metrics are prioritized as agents run autonomously and for more turns.Semi-autonomous, moderate security and audit required.Moderate – works with other agents.
L4 Dynamic PlanningAll correctness and robustness metrics are prioritized as agents run autonomously and for more turns.Semi-autonomous, moderate security and audit required.Strong – composable.
L5 Fully AutonomousAll correctness and robustness metrics are prioritized as agents run autonomously and for more turns.Fully autonomous — strong security controls and audit required.Excellent – well-defined and highly composable.

Technology Readiness Level (TRL)

The Technology Readiness Level (TRL) framework gauges how close an agent is to real-world deployment, focusing on its technical robustness, operational environment, and regulatory readiness. Our 7-tier adaptation tracks the progression from initial concepting through highly regulated production usage.

LevelTierDescriptionTypical Activities / Deployment Context
Level 1Concept & Roadmap TierAgent requirements and functionalities conceptualized and validated for business outcomes.Process re-engineering, functional due diligence.
Level 2Prototype TierWorking standalone agent prototypes validated for intended functionality and outcomes with mostly synthetic data.Simple internal demos, proof-of-concept builds.
Level 3Trusted Agent Huddle TierWorking agent prototypes onboarded to AI refinery and enabled for trusted agent huddle orchestration across systems.Hackathons, cross-functional demos, proof-of-concepts, citizen / power-user experiments.
Level 4Pilot TierWorking agents with full functionality and limited robustness in a real-life workflow tuned to perform with production data.COE teams, production rollout for pilot user groups and scale to get feedback and improve.
Level 5Production Tier - General UsageWorking agents that are deployed and operating in general purpose internal facing scaled production usage.Startups, less-regulated internal business functions or production pilots.
Level 6Production Tier - Customer UsageWorking agents with high functionality, performance, tuning, controls for customer facing scaled production usage.Customer facing business functions or production pilots.
Level 7Production Tier - Regulatory UsageWorking agents with high functionality, performance, tuning, regulatory level controls for customer facing scaled production usage.Highly regulated financial, healthcare, federal production systems.

In essence: TRL reflects market and deployment readiness, complementing Maturity (how autonomous it is) and Certification (how trustworthy it is) to give a full picture of an agent’s lifecycle.

AMC-5 Certificate Card

Below is an example of an in-development AMC-5 report / certificate. The agent maturity is scaled via LLM-as-a-Judge execution of the AM-5 questionaire. Certification is detailed in the certification page.

CategoryAssessment based on L3 Certification
Maturity - InteractionYes — The user provides a goal in the investment research task to the "Investment Strategy Advisor".
Maturity - AutonomyYes — The multiple agents communicate and pass information between each other using a FlowSuperAgent.
Maturity - ScopeYes — The agent has clear, specialized roles defined like "Stock Price Researcher" and "Financial Report Writer" within an agent hierarchy with "Investment Strategy Advisor" and is part of a coordinated process using a FlowSuperAgent.
Technology Readiness LevelLevel 3 – Trusted Agent Huddle Tier — Built as an example for utilising the FlowSuperAgent within the AI Refinery and deployed within experimentation environments.
Certification - PerformancePass — The agent meets the required standards for tool selection, invocation correctness, reliability, and latency.
Certification - FunctionalityExcellent — The agent demonstrates high capability in interoperability, composability, documentation, and guardrails.
Certification - SecurityA (Minimal Risk) — The agent has the highest security grade, assessed on factors like CIA impact and likelihood.

Agentic Maturity & Certification (AMC-5)

This certifies the agent’s maturity level and the results across Performance, Security, and Functional readiness.

Agent
CAAI/Investment_Strategy_Advisor
Owner / Team
CAAI
Certificate ID
AMC-2025-00042
Issue Date
2025-01-15
Valid Until
2026-01-15
Maturity Level
L1L2L3L4L5
L3 — User as Consultant • Agents as Team Players
Goal-driven portfolio orchestration with supervised approvals.
Technology Readiness Level
Level 1Level 2Level 3Level 4Level 5Level 6Level 7
Level 3Trusted Agent Huddle Tier

Agent prototypes orchestrated across systems within the AI refinery and trusted huddles.

Certification
Performance
PASS

Tool selection · Invocation correctness · Reliability · Latency

Security (Risk Grade)
A

Impact × Likelihood on CIA · Identity · Audit · Zero-Trust

Functional
Excellent

Interop (AIR/MCP/A2A) · Composability · Docs · Guardrails

Legend — Security Risk: A (minimal) · B (managed) · C (moderate) · D (high)
AMC-5 • Agentic Maturity & Certification

This is an in-development schema for the AMC type of certification and maturity from the gallery.

{
  "agent_id": "CAAI/Investment_Strategy_Advisor",
  "maturity_level": 3,
  "cert": {
    "performance": "pass",
    "security_grade": "A",
    "functional": "excellent"
  },
  "telemetry": {
    "tool_selection_acc": 0.94,
    "valid_invocation_rate": 0.98,
    "success_rate": 0.97,
    "latency_p95_ms": 2100
  },
  "security": {
    "impact": "medium",
    "likelihood": "low",
    "controls": ["scoped OAuth", "audit sink"]
  }
}