How the Agent Gallery certifies agents across Functionality, Performance, and Security.

Certifying Agents

Outcome: a clear, explainable certification that scales across thousands of agents.

Each agent in the Gallery is certified on three pillars:

Performance — effectiveness, correctness & latency
Security — risk-based posture, supply chain & agent-specific (i.e. prompt-injections)
Functional — ecosystem readiness & usability

Dimension	Metric Type	Assessment Output	Frequency
Performance	Quantitative	Pass / Fail	Post-Build / Continuous (latency measured at runtime)
Security	Qualitative Risk Matrix	A, B, C, D	Periodic (currently every 30 days, compliant to ageing policy)
Functional	Ecosystem Readiness	Excellent, Strong, Moderate and Basic	Post-Build

Certification in the Gallery

Whenever viewing the details of an agent, we can see the current certification status for Performance, Functional and Security dimensions. Clicking on the See details link on each of the certification status cards will open a dialog with the specific certification details.

Agent certification overview dashboard showing summary tiles and statuses.

The performance card consists of performance tests conducted on this agent. Hover each of the icons to see more details.

The Tools work and Latency tests work on aggregate data from multiple users, as such, they are not available for some of the agents on the gallery till more data is collected through frequent usage. Read Performance Certification to see how we validate correctness & speed.

Performance certification interface highlighting latency and correctness metrics.

The functional card consists of performance tests conducted on this agent. Hover each of the icons to see more details. Read Functionality Certification for more details.

Functional certification checklist emphasizing ecosystem readiness.

The security certifcation card consists of the final security score of this agent. Hover each of the icons to see more details (e.g. how many critical, high, medium vulnerabilities were flagged in the container scan). Read Security Certification for details on the grading pipeline.

Security certification risk matrix with graded findings.

Certification Process (at a glance)

Sequence of Certification

For MCP Agents which are built from open source repositories regularly, they follow a sequential order: build, certify, publish to gallery. For new agents that are added by gallery users, the certification process is triggered in parallel so that agents can be onboarded and tested quickly.

Add an Agent
- Upload metadata, source code, and build recipe.
Evaluation Data
- A GenAI pipeline generates task-specific evaluation data from the agent metadata.
Run Tests
- Functionality: static analysis → ground-truth formats & outputs.
- Performance: tool-use correctness, error rates, and latency.
- Security: static/dynamic checks, supply-chain scans & risk scoring.
Aggregate & Grade
- Persist results, compute scores/grades, and publish to the Gallery.

Next steps

Read Functionality Certification to understand static analysis & ecosystem checks.
Read Performance Certification to see how we validate correctness & speed.
Read Security Certification to learn the risk framework and grading pipeline.

Agent Readiness Measures

Certifying Agents

Certification in the Gallery

Certification Process (at a glance)

Next steps

On this page