Concepts

Core Concepts

AgentCop's security model spans three layers: static analysis before deploy, behavioral monitoring at runtime, and execution gating before every tool call.

Three-Layer Architecture

No single security control is sufficient against adversarial agents. AgentCop stacks three independent layers — so that an attacker who defeats one still faces two more.

text

┌─────────────────────────────────────────────────┐
│  Layer 1: Scanner (Pre-Deploy)                  │
│  AST analysis · OWASP LLM Top 10 · Trust Score  │
├─────────────────────────────────────────────────┤
│  Layer 2: Monitor (Runtime)                     │
│  Behavioral analysis · Anomaly detection        │
├─────────────────────────────────────────────────┤
│  Layer 3: Gate (Execution)                      │
│  Permission layer · Approval boundaries         │
└─────────────────────────────────────────────────┘

Concept Pages

Each layer of the security model has a dedicated concept page. Start with the Scanner Engine and work down.

Layer 1

Scanner Engine

AST-based static analysis that maps your agent code to the OWASP LLM Top 10.

Layer 2

Runtime Monitor

Behavioral analysis that watches what your agent does, not just what it says.

Layer 3

Execution Gate

The enforcement layer that blocks unauthorized tool calls before they run.

Layer 3

Permission Layer

Declarative rules that define what your agent is and isn't allowed to do.

Layer 3

Sandbox

Isolated execution environment that limits blast radius of agent actions.

Layer 3

Approval Boundaries

Human-in-the-loop checkpoints for high-risk operations.

Signal

Badge System

Embeddable trust scores that signal your agent's security posture.

Why Three Layers?

Each layer catches a different class of threat. None of them is complete on its own.

Layer 1 (Scanner) catches insecure code patterns at write-time — before the agent ships. It finds hardcoded secrets, injection-prone prompt templates, and dangerous function calls. But it cannot see what inputs arrive at runtime.
Layer 2 (Monitor) watches behavior during execution. A prompt injection attack looks like normal code until it runs. The monitor detects behavioral anomalies — unusual tool sequences, unexpected outbound data, sudden scope changes — that static analysis is blind to.
Layer 3 (Gate) enforces permissions at the moment of action. Even if an attacker bypasses Layers 1 and 2, the Gate prevents unauthorized tool calls from completing. It is the last mechanical barrier before consequences occur.

An attacker who successfully injects a malicious prompt still faces the Gate's permission rules and the Sandbox's isolation. Compromise at one layer does not cascade automatically — which is what defense in depth means in practice.

defense in depth isn't paranoia. it's the only thing that works against adversarial prompts.