Understand the AMC-5 maturity levels and certification axes covering performance, security, and functionality.

Agentic Maturity and Certification (AMC)

Purpose. Provide a unified way to see an agent’s autonomy level and trust its readiness:

Maturity (L1→L5): how agentic it is, how much autonomy and agency.
Certification (Performance / Security / Functional): how secure, effective, and interoperable it is.
Technology Readiness Level (TRL1→TRL3): assessment of the market readiness of the agents for deployment.

Maturity Levels

We adopt a framework for agentic maturity, from Level 1 to Level 5, with L5 being the highest maturity. We call this the AM-5 framework - which draws inspiration from seminal work on agentic autonomy like Levels of Autonomy for AI Agents, the Agentic Identity Maturity Model and the Agentic Enterprise Maturity Model.

Hinge: From L3+, agents start to work as teams with greater autonomy.

Agent Maturity Questionaire

To determine the AM-5 maturity level for agents in the gallery, we apply a questionaire to each individual agent based on its metadata. This questionaire is then executed by a frontier model, and composed to form a rubric for an LLM-as-a-judge classification.

Specifically the questions probe the nature of user interaction, agent autonomy, and the scope of work to accurately place an agent's capabilities within the AM-5 framework.

Level 1: User as Operator, Agents as Task Executors

This level is defined by direct command and single-purpose execution. The agent is a simple tool.

Interaction: Does the user provide an explicit, step-by-step instruction for a single, predefined task (e.g., "summarize this text," "translate this sentence")?
Autonomy: Does the agent operate without any independent decision-making, strictly following a fixed script or a rigid set of rules?
Scope: Is the agent's functionality limited to a narrow, singular purpose, unable to chain commands or handle ambiguity?

Level 2: User as Collaborator, Agents as Trustworthy Assistants

This level involves delegating a complete but contained task that may require multiple steps. The agent is a reliable helper.

Interaction: Can the user delegate a complete, but well-defined, multi-step task (e.g., "research topic X and comparing the outputs given by Gemini, Claude, OpenAI and Perplexity") and expect a finished product?
Autonomy: Does the agent require human supervision or intervention to handle exceptions, confirm intermediate steps, or resolve ambiguities?
Scope: Can the agent reliably execute a sequence of related actions to fulfill a single, delegated outcome?

Level 3: User as Consultant, Agents as Team Players

This is the hinge level where multiple agents begin working together. The focus is on coordination and workflows.

Interaction: Does the user provide a goal to a system of agents, which then coordinate amongst themselves to achieve it, with the user acting as a consultant?
Autonomy & Collaboration: Do multiple agents communicate and pass information between each other to complete a workflow without direct, step-by-step user management?
Scope & Complexity: Does the system involve specialized agents with distinct roles or "identities" that contribute to a larger, orchestrated process (e.g. agents within a super, utility agent hierachy each with defined tasks like a Market Research Utility Agent reporting to a Sourcing Intelligence Super Agent)?

Level 4: User as Approver, Agents as Goal-Driven Professionals

At this level, agents become proactive planners. The user's primary role shifts from director to approver.

Interaction: Is the user's main role to approve or reject a multi-step plan that was autonomously generated by the agent(s) in response to a high-level goal?
Autonomy: Does the agent independently devise its own strategy, create novel sub-tasks, and sequence actions to achieve the user's abstract objective?
Scope & Adaptability: Can the agent incorporate feedback to dynamically refine its plan before or during execution, demonstrating a professional, goal-driven approach?

Level 5: User as Observer, Agents as Autonomous Workers

This is the highest level of maturity, where agents function like self-sufficient digital employees.

Interaction: Does the user primarily observe the agent's performance against high-level, long-running objectives, rather than managing individual tasks or plans?
Autonomy: Does the agent operate with full autonomy, managing its own resources, workflows, and long-term strategy to achieve business-level outcomes?
Scope & Adaptability: Does the agent proactively optimize its own performance and strategies over time based on outcomes, without requiring explicit human guidance for improvement?

Agentic Certification Mapping

Next, we can measure the L1 to L5 agentic maturity alongside certification axes. Putting them together gives us an AMC-5 framework, of Agentic Maturity and Certification - which places besides each maturity level, a thinking of how the other axes of agent certification, performance, security and functional would work.

Level	Performance Certification	Security Certification	Functionality / Interoperability
L1 Narrow Task	Correctness metrics are prioritized. Human plays a major role in ensuring robustness and reliability (e.g., copilot).	Human-centric interaction, audit, and moderate security required.	Not required.
L2 Multi-Task	Correctness metrics are prioritized. Human plays a major role in ensuring robustness and reliability (e.g., copilot).	Human-centric interaction, audit, and moderate security required.	Basic – works with orchestrator / models.
L3 Pre-Coordinated	All correctness and robustness metrics are prioritized as agents run autonomously and for more turns.	Semi-autonomous, moderate security and audit required.	Moderate – works with other agents.
L4 Dynamic Planning	All correctness and robustness metrics are prioritized as agents run autonomously and for more turns.	Semi-autonomous, moderate security and audit required.	Strong – composable.
L5 Fully Autonomous	All correctness and robustness metrics are prioritized as agents run autonomously and for more turns.	Fully autonomous — strong security controls and audit required.	Excellent – well-defined and highly composable.

Technology Readiness Level (TRL)

The Technology Readiness Level (TRL) framework gauges how close an agent is to real-world deployment, focusing on its technical robustness, operational environment, and regulatory readiness. Our 7-tier adaptation tracks the progression from initial concepting through highly regulated production usage.

Level	Tier	Description	Typical Activities / Deployment Context
Level 1	Concept & Roadmap Tier	Agent requirements and functionalities conceptualized and validated for business outcomes.	Process re-engineering, functional due diligence.
Level 2	Prototype Tier	Working standalone agent prototypes validated for intended functionality and outcomes with mostly synthetic data.	Simple internal demos, proof-of-concept builds.
Level 3	Trusted Agent Huddle Tier	Working agent prototypes onboarded to AI refinery and enabled for trusted agent huddle orchestration across systems.	Hackathons, cross-functional demos, proof-of-concepts, citizen / power-user experiments.
Level 4	Pilot Tier	Working agents with full functionality and limited robustness in a real-life workflow tuned to perform with production data.	COE teams, production rollout for pilot user groups and scale to get feedback and improve.
Level 5	Production Tier - General Usage	Working agents that are deployed and operating in general purpose internal facing scaled production usage.	Startups, less-regulated internal business functions or production pilots.
Level 6	Production Tier - Customer Usage	Working agents with high functionality, performance, tuning, controls for customer facing scaled production usage.	Customer facing business functions or production pilots.
Level 7	Production Tier - Regulatory Usage	Working agents with high functionality, performance, tuning, regulatory level controls for customer facing scaled production usage.	Highly regulated financial, healthcare, federal production systems.

In essence: TRL reflects market and deployment readiness, complementing Maturity (how autonomous it is) and Certification (how trustworthy it is) to give a full picture of an agent’s lifecycle.

AMC-5 Certificate Card

Below is an example of an in-development AMC-5 report / certificate. The agent maturity is scaled via LLM-as-a-Judge execution of the AM-5 questionaire. Certification is detailed in the certification page.

Category	Assessment based on L3 Certification
Maturity - Interaction	Yes — The user provides a goal in the investment research task to the "Investment Strategy Advisor".
Maturity - Autonomy	Yes — The multiple agents communicate and pass information between each other using a FlowSuperAgent.
Maturity - Scope	Yes — The agent has clear, specialized roles defined like "Stock Price Researcher" and "Financial Report Writer" within an agent hierarchy with "Investment Strategy Advisor" and is part of a coordinated process using a FlowSuperAgent.
Technology Readiness Level	Level 3 – Trusted Agent Huddle Tier — Built as an example for utilising the FlowSuperAgent within the AI Refinery and deployed within experimentation environments.
Certification - Performance	Pass — The agent meets the required standards for tool selection, invocation correctness, reliability, and latency.
Certification - Functionality	Excellent — The agent demonstrates high capability in interoperability, composability, documentation, and guardrails.
Certification - Security	A (Minimal Risk) — The agent has the highest security grade, assessed on factors like CIA impact and likelihood.

Agentic Maturity & Certification (AMC-5)

This certifies the agent’s maturity level and the results across Performance, Security, and Functional readiness.

Agent

CAAI/Investment_Strategy_Advisor

Owner / Team

CAAI

Certificate ID

AMC-2025-00042

Issue Date

2025-01-15

Valid Until

2026-01-15

Maturity Level

L1L2L3L4L5

L3 — User as Consultant • Agents as Team Players

Goal-driven portfolio orchestration with supervised approvals.

Technology Readiness Level

Level 1Level 2Level 3Level 4Level 5Level 6Level 7

Level 3 — Trusted Agent Huddle Tier

Agent prototypes orchestrated across systems within the AI refinery and trusted huddles.

Certification

Performance

PASS

Tool selection · Invocation correctness · Reliability · Latency

Security (Risk Grade)

Impact × Likelihood on CIA · Identity · Audit · Zero-Trust

Functional

Excellent

Interop (AIR/MCP/A2A) · Composability · Docs · Guardrails

Legend — Security Risk: A (minimal) · B (managed) · C (moderate) · D (high)

AMC-5 • Agentic Maturity & Certification

Gallery AMC-5 Schema

This is an in-development schema for the AMC type of certification and maturity from the gallery.

{
  "agent_id": "CAAI/Investment_Strategy_Advisor",
  "maturity_level": 3,
  "cert": {
    "performance": "pass",
    "security_grade": "A",
    "functional": "excellent"
  },
  "telemetry": {
    "tool_selection_acc": 0.94,
    "valid_invocation_rate": 0.98,
    "success_rate": 0.97,
    "latency_p95_ms": 2100
  },
  "security": {
    "impact": "medium",
    "likelihood": "low",
    "controls": ["scoped OAuth", "audit sink"]
  }
}

Agent Maturity

Agentic Maturity and Certification (AMC)

Maturity Levels

L1 Narrow Task – User as Operator, Agents as Task Executors

L2 Multi-Task – User as Collaborator, Agents as Trustworthy Assistants

L3 Pre-Coordinated – User as Consultant, Agents as Team Players

L4 Dynamic Planning – User as Approver, Agents as Goal-Driven Professionals