Security

OWASP LLM Top 10

AgentCop maps every detection to the OWASP Top 10 for Large Language Model Applications. Here's what each category means and how AgentCop addresses it.

The OWASP LLM Top 10 is the most important security reference for AI systems. Released in 2023, updated in 2025, it defines the ten most critical security risks specific to LLM applications. Every AgentCop detection maps to one of these categories.

Coverage Summary

Code	Name	AgentCop Coverage	Severity
LLM01	Prompt Injection	Full	HIGH
LLM02	Insecure Output Handling	Full	CRITICAL
LLM03	Training Data Poisoning	Partial	HIGH
LLM04	Model Denial of Service	Full	MEDIUM
LLM05	Supply Chain Vulnerabilities	Advisory	HIGH
LLM06	Sensitive Information Disclosure	Full	HIGH
LLM07	Insecure Plugin Design	Partial	HIGH
LLM08	Excessive Agency	Full	HIGH
LLM09	Overreliance	Advisory	MEDIUM
LLM10	Model Theft	Partial	HIGH

LLM01 — Prompt Injection

The most exploited LLM vulnerability. An attacker embeds instructions in data your agent processes — and your agent obeys them.

LLM01

What AgentCop detects: f-string interpolation of external data into prompts, .format() calls with user-controlled variables, and raw string concatenation into prompt templates.

python

# Vulnerable — user_input flows directly into the prompt
prompt = f"Help with: {user_input}"
result = llm.predict(prompt)

# Safe — use a structured template with sanitized input
from langchain.prompts import PromptTemplate
template = PromptTemplate(input_variables=["question"], template="Help with: {question}")
result = llm.predict(template.format(question=sanitized_input))

LLM02 — Insecure Output Handling

Your agent's output is untrusted. Treating it as trusted code is how you get RCE.

LLM02

What AgentCop detects: eval(), exec(), and compile() called on LLM-generated strings; unsafe HTML rendering of agent output.

python

# Vulnerable — eval executes whatever the LLM returns
result = eval(agent.run(task))

# Safe — parse and validate output explicitly
raw = agent.run(task)
result = json.loads(raw)           # structured parsing
assert set(result.keys()) <= ALLOWED_KEYS   # schema validation

LLM03 — Training Data Poisoning

Unvalidated data in your training pipeline poisons every model that learns from it — silently, permanently, at scale.

LLM03

What AgentCop detects: Unvalidated writes to vector stores, direct user input flowing into embedding or fine-tune pipelines without sanitization.

python

# Vulnerable — raw user input written directly to vector store
vectorstore.add_texts([user_input])

# Safe — validate and sanitize before storing
sanitized = sanitize_text(user_input)
if passes_content_policy(sanitized):
    vectorstore.add_texts([sanitized], metadata={"source": "user", "reviewed": True})

LLM04 — Model Denial of Service

Infinite loops and unbounded recursion cost money and crash production. In agent systems, a single bad prompt can spiral into thousands of API calls.

LLM04

What AgentCop detects: while True without a break or return, recursive agent calls without depth limits, unbounded token requests.

python

# Vulnerable — unbounded loop, no exit condition
while True:
    result = agent.run(task)
    task = refine_task(result)

# Safe — bounded iteration with explicit maximum
MAX_ITERATIONS = 10
for _ in range(MAX_ITERATIONS):
    result = agent.run(task)
    if is_complete(result):
        break
    task = refine_task(result)

LLM05 — Supply Chain Vulnerabilities

Importing an untrusted agent library is importing untrusted behavior. You don't get to audit what runs at inference time.

LLM05

What AgentCop detects: Imports from non-official agent tool packages, unverified model weights loaded from arbitrary URLs.

Advisory: Use pip audit and verify checksums for all agent dependencies. Pin dependency versions in requirements.txt and use a lockfile. Never load model weights from unverified sources.

bash

# Audit your agent dependencies
pip audit

# Verify model checksum before loading
sha256sum model.bin
# compare against the published checksum from the model card

LLM06 — Sensitive Information Disclosure

API keys in source code are public API keys. Act accordingly — every repo you push is a potential credential leak waiting to be discovered.

LLM06

What AgentCop detects: Hardcoded API keys matching patterns for OpenAI, Anthropic, AWS, and other providers; hardcoded passwords; private key material in source files.

python

# Vulnerable — hardcoded credential in source
api_key = "sk-proj-abc123XYZ..."
client = openai.OpenAI(api_key=api_key)

# Safe — load from environment at runtime
import os
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise EnvironmentError("OPENAI_API_KEY not set")
client = openai.OpenAI(api_key=api_key)

LLM07 — Insecure Plugin Design

Every tool your agent can call is an attack surface. Minimize it — the least-privilege principle applies to tools, not just users.

LLM07

What AgentCop detects: Tools registered with broad or wildcard permissions, tools that accept arbitrary code as input, tools with no input validation layer.

Principle of Least Privilege: Give each tool the minimum permissions required for its stated purpose. A summarization tool does not need file system access. A search tool does not need network write access.

python

# Vulnerable — tool accepts and executes arbitrary code
class CodeRunnerTool(BaseTool):
    def _run(self, code: str) -> str:
        return eval(code)   # any code the LLM sends gets executed

# Safe — constrained tool with validated inputs only
class CalculatorTool(BaseTool):
    ALLOWED_OPS = {'+', '-', '*', '/'}
    def _run(self, expression: str) -> str:
        if not all(c in '0123456789 .()+-*/' for c in expression):
            raise ValueError("Invalid expression")
        return str(eval(expression, {"__builtins__": {}}))

LLM08 — Excessive Agency

Agents do what they're told. If you give them the capability to do anything, they will — including things you didn't intend and can't reverse.

LLM08

What AgentCop detects: Shell tools, file deletion operations, email sending, and broad network access used without approval gates.

python

# Vulnerable — unrestricted shell access given directly to agent
from langchain.tools import ShellTool
agent = initialize_agent(tools=[ShellTool()], llm=llm)

# Safe — wrap high-risk tools with an approval gate
from agentcop import gate

restricted_policy = gate.Policy(
    allow=["ls", "cat"],
    deny=["rm", "curl", "wget", "nc"],
    require_approval=True
)
safe_shell = gate.wrap(ShellTool(), policy=restricted_policy)
agent = initialize_agent(tools=[safe_shell], llm=llm)

LLM09 — Overreliance

Production systems that trust LLM output without validation are production accidents waiting to happen. Hallucinations aren't bugs — they're a design characteristic.

LLM09

What AgentCop detects: Patterns where LLM output flows directly to database writes, file operations, or external API calls without an intervening validation step.

Advisory: Validate all LLM outputs at system boundaries. Never write LLM-generated content to a database or send it to an external API without schema validation, range checks, and a human-readable audit trail.

python

# Vulnerable — LLM output written directly to DB without validation
db.execute("INSERT INTO orders VALUES (?)", [agent.run(order_prompt)])

# Safe — validate LLM output before any system write
raw_output = agent.run(order_prompt)
order = OrderSchema.parse_raw(raw_output)   # raises on invalid schema
order.validate_business_rules()             # domain validation
db.execute("INSERT INTO orders VALUES (?)", [order.to_db_row()])

LLM10 — Model Theft

Model weights are IP. Exfiltration via the agent's own API is the novel attack vector — and it's harder to detect than traditional data theft.

LLM10

What AgentCop detects: Patterns that could exfiltrate model weights, training data, or system prompts through bulk extraction or side-channel queries.

Advisory: Rate-limit inference endpoints. Monitor for bulk extraction patterns — unusually high query volume, queries that systematically probe the model's decision boundary, or attempts to extract system prompt content.

python

# Protect against system prompt extraction
from agentcop.monitor import Monitor

monitor = Monitor(
    rate_limit=100,          # max queries per minute per user
    alert_on_system_prompt_probing=True,
    alert_on_bulk_extraction=True,
    extraction_threshold=50  # alert if >50 similar queries in 1hr
)

@monitor.protect
def run_agent(user_id: str, query: str) -> str:
    return agent.run(query)

these aren't hypothetical threats. llm01 brought down openclaw in 2026. llm02 enabled the langchain rce. llm06 leaked credentials from a thousand github repos. the owasp list exists because the industry learned these lessons the hard way.