Evaluating Agent Vendors: 28 Questions Your Security Team Should Be Asking

Your security team has a vendor evaluation process. They know how to assess a SaaS application, a cloud provider, or an infrastructure vendor. They have questionnaires, security scorecards, SOC 2 reports to review, and penetration test results to analyze.

Now your business team wants to deploy an agent platform — Anthropic's computer use, Microsoft's Copilot Studio, Salesforce AgentForce, ServiceNow's AI Agents, or one of the dozens of agent frameworks competing for enterprise adoption. Your procurement process sends the standard security questionnaire. The vendor sends back the standard answers.

And you've learned almost nothing about the actual risk.

Agent platforms are not SaaS applications. They don't just store and process your data — they make decisions with it and take actions on your behalf. The standard vendor security questionnaire was never designed to evaluate these capabilities, and using it creates a dangerous false sense of due diligence.

What follows are 28 questions organized across seven domains that your security team should be asking of any agent vendor or platform before you sign. These aren't hypothetical — they're the questions I ask when conducting vendor evaluations for enterprises deploying agent technology.

Domain 1: Agent Architecture and Isolation

These questions assess how the vendor's architecture prevents one customer's agents from affecting another, and how your agents are isolated from each other.

1. How are agent execution environments isolated between tenants? You need to understand whether agents run in shared infrastructure or dedicated environments. Shared execution environments create cross-tenant risk — a vulnerability in one customer's agent configuration could potentially affect others.

2. Can an agent from one workflow access data or tools from another workflow? Within your own tenant, you may run multiple agents for different purposes. Ensure there are hard boundaries preventing an agent deployed for customer service from accessing data or tools provisioned for your finance agents.

3. What happens when an agent encounters an error during execution? Does it fail safely (stop and report), fail open (continue with defaults), or fail unpredictably? The answer reveals how the vendor thinks about safety architecture.

4. How does the platform handle agent-to-agent communication in multi-agent deployments? If agents can communicate with each other, what controls prevent one agent from manipulating or being manipulated by another? Multi-agent architectures introduce privilege escalation risks that single-agent deployments don't have.

Domain 2: Data Handling and Privacy

These questions assess how the vendor handles your data as it flows through agent processing.

5. Where is agent context (conversation history, retrieved data, tool outputs) stored, and for how long? Agent context often contains your most sensitive data — customer information, internal documents, business logic. Understand the full lifecycle of this data.

6. Is agent context used to train or fine-tune the vendor's models? This is the question that should never be ambiguous. Your agent context should never be used for model improvement without explicit, revocable opt-in. Get this in writing in the contract.

7. What data residency options are available for agent processing? If you're subject to GDPR, data sovereignty laws, or industry-specific data localization requirements, you need to know where agent processing occurs — not just where your account data is stored.

8. How does the platform handle sensitive data detected in agent interactions? Does the platform identify and redact PII, credentials, or other sensitive data in agent logs and monitoring? Or is sensitive data persisted in plaintext across the full telemetry pipeline?

Domain 3: Permissions and Access Control

These questions assess how the platform manages what agents can do.

9. What is the most granular level of permission control available for agent actions? Can you restrict an agent to specific API endpoints, specific data fields, and specific action types? Or is permission control limited to coarse-grained roles?

10. Can you enforce action-level rate limiting per agent? Can you set limits like "this agent can process no more than 100 transactions per hour" or "this agent can send no more than 50 emails per day"? Rate limiting is your primary defense against agent errors cascading at machine speed.

11. How are agent credentials managed, rotated, and revoked? What is the credential lifecycle? Are credentials rotated automatically? Can credentials be revoked instantly? What happens to a running agent when its credentials are revoked?

12. Does the platform support human-in-the-loop approval workflows? Can you configure specific actions to require human approval before execution? Is this configurable per action type, per risk level, or per agent? How is the approval workflow presented to the human reviewer?

Domain 4: Observability and Audit

These questions assess your ability to monitor agent behavior and reconstruct events after the fact.

13. What telemetry does the platform capture for each agent action? You need: input received, reasoning steps (if available), tools invoked, data accessed, output produced, and any errors or guardrails triggered. Each telemetry event should include agent identity, timestamp, and correlation ID.

14. Can you export agent audit logs to your own SIEM or logging infrastructure? Audit logs that live only in the vendor's dashboard are insufficient for enterprise security operations. You need raw log export to integrate with your existing monitoring, alerting, and compliance infrastructure.

15. How long are audit logs retained, and can you extend retention? Regulatory requirements may mandate multi-year retention of records related to automated decision-making. Understand the vendor's default retention and whether extended retention is available.

16. Does the platform provide anomaly detection for agent behavior? Can the platform detect and alert on unusual agent behavior — deviation from historical patterns, unexpected data access, abnormal action frequency? Or is behavioral monitoring entirely your responsibility?

Domain 5: Safety and Guardrails

These questions assess the platform's built-in protections against agent misbehavior.

17. What built-in guardrails does the platform provide for agent outputs? Are there content filters, factuality checks, or compliance filters that operate on agent outputs before they reach users or external systems? Are these configurable or fixed?

18. Can you define custom guardrails specific to your organization's policies? Generic guardrails aren't sufficient for enterprise deployment. You need the ability to define organization-specific rules — prohibited topics, required disclaimers, data handling constraints — that the platform enforces.

19. How does the platform protect against prompt injection? If agents accept input from users or external systems, they're vulnerable to prompt injection attacks. What detection and prevention mechanisms are in place? Have they been independently tested?

20. Does the platform provide a kill switch for immediate agent shutdown? Can you instantly stop an agent from a management console, API call, or automated trigger? How quickly does the kill switch take effect? Are there dependencies that could delay shutdown?

Domain 6: Model Governance

These questions assess how the underlying AI models are managed and updated.

21. Which AI models power the agent, and can you pin to a specific model version? Model updates can change agent behavior in unexpected ways. You need the ability to pin to a specific model version and test updates before they affect production agents.

22. What is the vendor's model update and deprecation policy? How much notice do you get before a model version is deprecated? Is there a migration path? What happens if you're running agents on a deprecated model?

23. Can you test agent behavior against a new model version before it goes live? You need a staging or testing environment where you can validate that a model update doesn't degrade your agent's performance, accuracy, or safety before deploying to production.

24. Does the vendor provide model cards or documentation for the models used in agent processing? Model cards that describe training data, known limitations, performance characteristics, and safety evaluations are essential for risk assessment and regulatory compliance.

Domain 7: Contractual and Liability

These questions assess the legal and commercial terms that govern the relationship.

25. Who is liable when an agent takes an incorrect or harmful action? If your agent, running on the vendor's platform using the vendor's model, causes harm to a customer or violates a regulation, where does liability fall? This should be explicitly addressed in the contract, not left to general terms of service.

26. Does the vendor carry AI-specific liability insurance? General commercial liability may not cover agent-specific failures. Ask whether the vendor has insurance that specifically covers harms arising from AI agent behavior.

27. What are the vendor's commitments regarding data use and model training? Get contractual language — not just a privacy policy that can be updated unilaterally — that specifies how your data will and will not be used. Include the right to audit the vendor's compliance with these commitments.

28. What are the termination and data portability provisions? If you decide to leave the vendor, what happens to your agent configurations, custom guardrails, audit logs, and any fine-tuned models? Can you export your investment, or are you locked in?

How to Use These Questions

Don't send all 28 questions as a written questionnaire and expect useful answers. Vendor questionnaires invite template responses. Instead, use these questions in live evaluation sessions where you can probe, follow up, and ask for demonstrations.

Prioritize based on your deployment context. If you're deploying customer-facing agents, weight the safety and guardrails domain heavily. If you're in a regulated industry, weight observability, audit, and contractual domains. If you're deploying multi-agent architectures, weight the architecture and isolation domain.

Score responses not just on "does the capability exist" but on "is it production-ready, well-documented, and consistent with how the vendor describes it." A vendor that claims granular permission control but can only demonstrate role-based access is telling you something about their maturity.

And keep the responses on file. These questions will form the basis of your ongoing vendor review, your regulatory documentation, and your incident response preparation.

The Agent Governance Toolkit includes the complete vendor evaluation checklist with all 28 criteria, scoring worksheets, and a comparison matrix for evaluating multiple vendors side by side. Get the toolkit at agentguru.co →

Ritesh Vajariya is the CEO of AI Guru and founder of AgentGuru. Previously AWS Principal ($700M+ AI revenue), BloombergGPT Architect, and Cerebras Global Strategy Lead. He has trained 35,000+ professionals and built products serving 50,000+ users.

Agent Governance Toolkit

Ready to govern your AI agents?

20+ ready-to-deploy policy templates, risk frameworks, and governance playbooks. Deploy in hours, not months.

Get the Toolkit →