OWASP LLM Top 10
AgentCop maps every detection to the OWASP Top 10 for Large Language Model Applications. Here's what each category means and how AgentCop addresses it.
The OWASP LLM Top 10 is the most important security reference for AI systems. Released in 2023, updated in 2025, it defines the ten most critical security risks specific to LLM applications. Every AgentCop detection maps to one of these categories.
Coverage Summary
| Code | Name | AgentCop Coverage | Severity |
|---|---|---|---|
| LLM01 | Prompt Injection | Full | HIGH |
| LLM02 | Insecure Output Handling | Full | CRITICAL |
| LLM03 | Training Data Poisoning | Partial | HIGH |
| LLM04 | Model Denial of Service | Full | MEDIUM |
| LLM05 | Supply Chain Vulnerabilities | Advisory | HIGH |
| LLM06 | Sensitive Information Disclosure | Full | HIGH |
| LLM07 | Insecure Plugin Design | Partial | HIGH |
| LLM08 | Excessive Agency | Full | HIGH |
| LLM09 | Overreliance | Advisory | MEDIUM |
| LLM10 | Model Theft | Partial | HIGH |
LLM01 — Prompt Injection
The most exploited LLM vulnerability. An attacker embeds instructions in data your agent processes — and your agent obeys them.
LLM01What AgentCop detects: f-string interpolation of external data into prompts, .format() calls with user-controlled variables, and raw string concatenation into prompt templates.
# Vulnerable — user_input flows directly into the prompt
prompt = f"Help with: {user_input}"
result = llm.predict(prompt)
# Safe — use a structured template with sanitized input
from langchain.prompts import PromptTemplate
template = PromptTemplate(input_variables=["question"], template="Help with: {question}")
result = llm.predict(template.format(question=sanitized_input))
LLM02 — Insecure Output Handling
Your agent's output is untrusted. Treating it as trusted code is how you get RCE.
LLM02What AgentCop detects: eval(), exec(), and compile() called on LLM-generated strings; unsafe HTML rendering of agent output.
# Vulnerable — eval executes whatever the LLM returns
result = eval(agent.run(task))
# Safe — parse and validate output explicitly
raw = agent.run(task)
result = json.loads(raw) # structured parsing
assert set(result.keys()) <= ALLOWED_KEYS # schema validation
LLM03 — Training Data Poisoning
Unvalidated data in your training pipeline poisons every model that learns from it — silently, permanently, at scale.
LLM03What AgentCop detects: Unvalidated writes to vector stores, direct user input flowing into embedding or fine-tune pipelines without sanitization.
# Vulnerable — raw user input written directly to vector store
vectorstore.add_texts([user_input])
# Safe — validate and sanitize before storing
sanitized = sanitize_text(user_input)
if passes_content_policy(sanitized):
vectorstore.add_texts([sanitized], metadata={"source": "user", "reviewed": True})
LLM04 — Model Denial of Service
Infinite loops and unbounded recursion cost money and crash production. In agent systems, a single bad prompt can spiral into thousands of API calls.
LLM04What AgentCop detects: while True without a break or return, recursive agent calls without depth limits, unbounded token requests.
# Vulnerable — unbounded loop, no exit condition
while True:
result = agent.run(task)
task = refine_task(result)
# Safe — bounded iteration with explicit maximum
MAX_ITERATIONS = 10
for _ in range(MAX_ITERATIONS):
result = agent.run(task)
if is_complete(result):
break
task = refine_task(result)
LLM05 — Supply Chain Vulnerabilities
Importing an untrusted agent library is importing untrusted behavior. You don't get to audit what runs at inference time.
LLM05What AgentCop detects: Imports from non-official agent tool packages, unverified model weights loaded from arbitrary URLs.
pip audit and verify checksums for all agent dependencies. Pin dependency versions in requirements.txt and use a lockfile. Never load model weights from unverified sources.
# Audit your agent dependencies
pip audit
# Verify model checksum before loading
sha256sum model.bin
# compare against the published checksum from the model card
LLM06 — Sensitive Information Disclosure
API keys in source code are public API keys. Act accordingly — every repo you push is a potential credential leak waiting to be discovered.
LLM06What AgentCop detects: Hardcoded API keys matching patterns for OpenAI, Anthropic, AWS, and other providers; hardcoded passwords; private key material in source files.
# Vulnerable — hardcoded credential in source
api_key = "sk-proj-abc123XYZ..."
client = openai.OpenAI(api_key=api_key)
# Safe — load from environment at runtime
import os
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
raise EnvironmentError("OPENAI_API_KEY not set")
client = openai.OpenAI(api_key=api_key)
LLM07 — Insecure Plugin Design
Every tool your agent can call is an attack surface. Minimize it — the least-privilege principle applies to tools, not just users.
LLM07What AgentCop detects: Tools registered with broad or wildcard permissions, tools that accept arbitrary code as input, tools with no input validation layer.
# Vulnerable — tool accepts and executes arbitrary code
class CodeRunnerTool(BaseTool):
def _run(self, code: str) -> str:
return eval(code) # any code the LLM sends gets executed
# Safe — constrained tool with validated inputs only
class CalculatorTool(BaseTool):
ALLOWED_OPS = {'+', '-', '*', '/'}
def _run(self, expression: str) -> str:
if not all(c in '0123456789 .()+-*/' for c in expression):
raise ValueError("Invalid expression")
return str(eval(expression, {"__builtins__": {}}))
LLM08 — Excessive Agency
Agents do what they're told. If you give them the capability to do anything, they will — including things you didn't intend and can't reverse.
LLM08What AgentCop detects: Shell tools, file deletion operations, email sending, and broad network access used without approval gates.
# Vulnerable — unrestricted shell access given directly to agent
from langchain.tools import ShellTool
agent = initialize_agent(tools=[ShellTool()], llm=llm)
# Safe — wrap high-risk tools with an approval gate
from agentcop import gate
restricted_policy = gate.Policy(
allow=["ls", "cat"],
deny=["rm", "curl", "wget", "nc"],
require_approval=True
)
safe_shell = gate.wrap(ShellTool(), policy=restricted_policy)
agent = initialize_agent(tools=[safe_shell], llm=llm)
LLM09 — Overreliance
Production systems that trust LLM output without validation are production accidents waiting to happen. Hallucinations aren't bugs — they're a design characteristic.
LLM09What AgentCop detects: Patterns where LLM output flows directly to database writes, file operations, or external API calls without an intervening validation step.
# Vulnerable — LLM output written directly to DB without validation
db.execute("INSERT INTO orders VALUES (?)", [agent.run(order_prompt)])
# Safe — validate LLM output before any system write
raw_output = agent.run(order_prompt)
order = OrderSchema.parse_raw(raw_output) # raises on invalid schema
order.validate_business_rules() # domain validation
db.execute("INSERT INTO orders VALUES (?)", [order.to_db_row()])
LLM10 — Model Theft
Model weights are IP. Exfiltration via the agent's own API is the novel attack vector — and it's harder to detect than traditional data theft.
LLM10What AgentCop detects: Patterns that could exfiltrate model weights, training data, or system prompts through bulk extraction or side-channel queries.
# Protect against system prompt extraction
from agentcop.monitor import Monitor
monitor = Monitor(
rate_limit=100, # max queries per minute per user
alert_on_system_prompt_probing=True,
alert_on_bulk_extraction=True,
extraction_threshold=50 # alert if >50 similar queries in 1hr
)
@monitor.protect
def run_agent(user_id: str, query: str) -> str:
return agent.run(query)
these aren't hypothetical threats. llm01 brought down openclaw in 2026. llm02 enabled the langchain rce. llm06 leaked credentials from a thousand github repos. the owasp list exists because the industry learned these lessons the hard way.