All Insights
risk assessmentagent governanceframeworks

The 5 Dimensions of Agent Risk: A Scoring Methodology

A weighted scoring framework for assessing agent risk across data access, autonomy, blast radius, reversibility, and regulatory exposure.

RV
Ritesh Vajariya

When your security team evaluates risk for a new application or vendor, they have established methodologies. NIST's risk management framework. FAIR quantitative analysis. ISO 31000. These frameworks provide structured, repeatable processes for assessing and prioritizing risk.

No equivalent exists for AI agents.

Most organizations are either ignoring agent risk entirely or assessing it ad hoc — a quick conversation between the deploying team and a security engineer, maybe a one-page questionnaire borrowed from the application security review process. The result is inconsistent evaluation, blind spots, and an inability to compare risk across agents or prioritize governance investments.

What follows is a scoring methodology designed specifically for AI agents. It assesses risk across five dimensions that together capture the full surface area of agent-related exposure. Each agent in your inventory is scored across all five dimensions, producing a composite risk score that maps to governance tier requirements.

The Five Dimensions

Dimension 1: Data Access Scope

This dimension measures the breadth and sensitivity of data an agent can access.

An agent that reads public-facing FAQ content operates in a fundamentally different risk category than an agent that queries your customer database, accesses employee records, or reads financial data. The distinction isn't just about volume — it's about sensitivity classification, regulatory implications, and the consequences of unauthorized disclosure.

Scoring criteria:

Low (1-3): Agent accesses only public or non-sensitive internal data. No PII, no financial data, no regulated information. Data classification level is "public" or "internal-general." Example: an agent that summarizes publicly available product documentation.

Medium (4-6): Agent accesses internal sensitive data including business-confidential information, aggregated customer data, or non-PII operational data. Data classification level is "confidential." Example: an agent that queries sales pipeline data to generate forecasts.

High (7-10): Agent accesses regulated data (PII, PHI, financial records), executive communications, intellectual property, or data subject to specific regulatory requirements (GDPR, HIPAA, SOX, CCPA). Data classification level is "restricted" or "highly restricted." Example: an agent that processes customer support tickets containing personal information.

Assessment questions:

  • What data sources does this agent connect to?
  • What is the classification level of the data it accesses?
  • Does the data include PII, PHI, or regulated financial information?
  • Is the data access scoped to the minimum required, or does the agent have broad read access?
  • Could the agent's access be exploited to extract data beyond its intended scope (prompt injection, conversation manipulation)?

Dimension 2: Action Scope

This dimension measures what an agent can do — not just what data it reads, but what operations it can perform.

The critical distinction is between read-only agents and agents that can write, modify, delete, send, approve, or execute actions in external systems. Every action an agent can take is an action that could go wrong, and the consequences scale with the significance of the actions permitted.

Scoring criteria:

Low (1-3): Agent is read-only. It can retrieve information and generate responses but cannot modify data, send communications, or trigger workflows. Example: a research assistant that summarizes documents and answers questions.

Medium (4-6): Agent can perform bounded write operations within defined parameters. It can update records, send pre-approved communications, or trigger workflows that require human approval before execution. There are hard limits on transaction amounts, recipient lists, or action frequency. Example: an agent that drafts customer emails for human review before sending.

High (7-10): Agent can perform unbounded or high-consequence actions autonomously. It can execute transactions, send external communications, modify production databases, approve workflows, or take actions with regulatory or legal implications without human intervention. Example: an agent that processes refunds, modifies customer accounts, or approves purchase orders.

Assessment questions:

  • What systems can this agent write to?
  • Can the agent send external communications (email, chat, API calls to third parties)?
  • Can the agent execute financial transactions or approve workflows?
  • Are there hard limits on the magnitude of actions the agent can take?
  • Is there a human-in-the-loop requirement for high-consequence actions?

Dimension 3: Autonomy Level

This dimension measures the degree of human oversight in the agent's operation.

Autonomy exists on a spectrum. At one end, an agent that suggests actions for human approval. At the other, a fully autonomous agent that operates continuously without human review. The appropriate level of autonomy depends on the stakes of the agent's decisions and the organization's tolerance for unsupervised action.

Scoring criteria:

Low (1-3): Human-in-the-loop for all consequential actions. The agent recommends or drafts, but a human reviews and approves before anything is executed. The human has full context and the ability to override. Example: an agent that drafts responses for a customer service representative to review and send.

Medium (4-6): Human-on-the-loop. The agent operates autonomously within defined parameters, but humans monitor outputs and can intervene. There are automated guardrails that escalate edge cases to human review. Periodic human review of agent actions occurs on a defined cadence. Example: an agent that handles routine customer inquiries autonomously but escalates complex or unusual requests.

High (7-10): Fully autonomous operation with minimal or no human oversight. The agent makes decisions and takes actions without real-time human monitoring. Human review, if it occurs, is after the fact and on a sampling basis. Example: an agent that runs 24/7 processing transactions, with human review of a weekly summary report.

Assessment questions:

  • Does a human review agent outputs before they're acted upon?
  • Is there real-time monitoring of agent behavior by a human?
  • What triggers escalation to human review?
  • How frequently are agent actions reviewed after the fact?
  • Could the agent operate for an extended period without anyone noticing a problem?

Dimension 4: Error Impact

This dimension measures the consequences of agent mistakes or failures.

Not all errors are created equal. An agent that occasionally miscategorizes a support ticket creates minor inconvenience. An agent that provides incorrect medical information, makes a biased hiring recommendation, or executes an unauthorized financial transaction creates material harm. The scoring should reflect the worst realistic consequence of agent error, not the average case.

Scoring criteria:

Low (1-3): Errors cause inconvenience or inefficiency but no material harm. Mistakes are easily detected and reversible. No regulatory, legal, or reputational implications. Example: an internal agent that miscategorizes documents, requiring manual correction.

Medium (4-6): Errors affect business operations or customer experience in ways that require active remediation. Mistakes may take time to detect and correct. Could result in moderate financial impact or customer dissatisfaction. Example: an agent that provides incorrect product information to customers, requiring follow-up corrections.

High (7-10): Errors could cause material financial loss, regulatory violation, legal liability, reputational damage, or harm to individuals. Mistakes may be difficult to detect, expensive to remediate, or irreversible. Example: an agent that makes biased hiring recommendations, provides incorrect financial advice, or exposes sensitive customer data.

Assessment questions:

  • What is the worst realistic outcome of this agent making a mistake?
  • How quickly would an error be detected?
  • Is the error reversible, and at what cost?
  • Could an error trigger regulatory investigation or legal liability?
  • Could an error cause direct harm to customers, employees, or third parties?

Dimension 5: Compliance Exposure

This dimension measures the regulatory and legal landscape the agent operates within.

Agents operating in regulated industries or handling regulated data face a higher compliance bar. The scoring reflects not just current regulatory requirements but the trajectory of AI-specific regulation, which is moving quickly toward mandatory governance requirements for autonomous systems.

Scoring criteria:

Low (1-3): Agent operates in an unregulated context with no specific compliance requirements. No sector-specific regulations apply. Data handling does not trigger privacy law obligations. Example: an internal agent that helps employees brainstorm marketing ideas.

Medium (4-6): Agent operates in a context where general regulations apply (GDPR, CCPA, general consumer protection) but no AI-specific requirements are in force. The organization's sector has regulatory guidance but not binding mandates. Example: an agent that processes customer data in a retail context.

High (7-10): Agent operates in a heavily regulated sector (financial services, healthcare, government) or handles data subject to specific regulatory frameworks. AI-specific regulations apply or are imminent. Sector regulators have issued guidance or requirements for AI governance. Example: an agent that processes insurance claims, assists with medical triage, or handles financial transactions.

Assessment questions:

  • What sector-specific regulations apply to this agent's operations?
  • Does the agent handle data covered by GDPR, HIPAA, SOX, CCPA, or other privacy/security regulations?
  • Is the agent's domain subject to AI-specific regulation (EU AI Act high-risk categories)?
  • Has the relevant sector regulator issued AI governance guidance?
  • Would the organization need to explain this agent's governance to a regulator?

The Composite Risk Score

Each dimension is scored 1-10, then weighted to produce a composite:

| Dimension | Weight | Rationale | |-----------|--------|-----------| | Data Access Scope | 25% | Data exposure is the most common source of material harm | | Action Scope | 25% | Actions create irreversible consequences | | Autonomy Level | 20% | Higher autonomy means less opportunity to catch errors | | Error Impact | 15% | Reflects consequence severity | | Compliance Exposure | 15% | Reflects regulatory and legal risk multiplier |

Composite Score = (Data × 0.25) + (Action × 0.25) + (Autonomy × 0.20) + (Error × 0.15) + (Compliance × 0.15)

Governance Tier Mapping

The composite score maps to governance tiers that define the minimum controls required.

Tier 1 — Standard (Score 1.0–3.0): Basic agent documentation, standard access controls, periodic review. Suitable for low-risk internal tools. Quarterly review cadence.

Tier 2 — Enhanced (Score 3.1–5.0): Formal agent onboarding review, documented data access scope, action logging, human-on-the-loop for edge cases. Monthly review cadence.

Tier 3 — Elevated (Score 5.1–7.0): Comprehensive agent governance package: detailed policy documentation, continuous output monitoring, human-in-the-loop for high-consequence actions, incident response playbook, formal approval for deployment and changes. Bi-weekly review cadence.

Tier 4 — Critical (Score 7.1–10.0): Maximum governance controls: executive-level approval for deployment, real-time monitoring with automated circuit breakers, mandatory human review of all consequential outputs, comprehensive audit trail, regulatory documentation package, dedicated agent owner with governance training. Weekly review cadence.

Putting It Into Practice

Score every agent in your inventory. Build a risk heat map. Prioritize governance investments on your highest-scoring agents. Use the scores to have evidence-based conversations about which agents need more controls, which agents might need to be decommissioned, and where to allocate security and compliance resources.

The methodology is designed to be repeatable and comparable — you can track an agent's risk score over time as its capabilities, data access, or operating context change. And you can use aggregate scores to give leadership and the board a portfolio-level view of agent risk.


The Agent Governance Toolkit includes the complete risk assessment matrix with scoring worksheets, governance tier definitions, and a portfolio risk dashboard template. Get the toolkit at agentguru.co →

Want to see how your agents score? Start with the free 25-point Agent Governance Checklist at agentguru.co.


Ritesh Vajariya is the CEO of AI Guru and founder of AgentGuru. Previously AWS Principal ($700M+ AI revenue), BloombergGPT Architect, and Cerebras Global Strategy Lead. He has trained 35,000+ professionals and built products serving 50,000+ users.

Agent Governance Toolkit

Ready to govern your AI agents?

20+ ready-to-deploy policy templates, risk frameworks, and governance playbooks. Deploy in hours, not months.

Get the Toolkit →