Drift Taxonomy

The six types of agent drift Caret detects, with examples and detection strategies.

Overview

Caret detects six distinct types of agent drift. Each has its own detection strategy, severity level, and recommended intervention. Understanding the taxonomy helps you configure thresholds and interventions for your team.

Scope drift

Severity: High

The agent modifies files outside the scope of your prompt. This is the most common drift type, occurring in roughly 40% of drifted sessions.

Example: You ask the agent to "fix auth token refresh in src/auth/". Three tool calls in, it starts editing src/db/schema.ts.

Detection: Caret compares every file the agent touches against the intent record's scope field. Files outside scope accumulate a drift score.

Common causes: The agent follows import chains too aggressively, decides a "proper fix" requires upstream changes, or misunderstands the scope of the task.

Deep dive: Scope Drift Detection →

Retry loops

Severity: Medium

The agent edits the same file repeatedly with the same error pattern. It's stuck and burning tokens without making progress.

Example: The agent edits middleware.ts, runs pnpm test, sees a failure, edits middleware.ts again with a similar change, tests again, fails again — 5 times in a row.

Detection: Caret tracks file-edit frequency and consecutive failure patterns. Three or more edits to the same file with the same failing test triggers an alert.

Cost runaway

Severity: Medium

Token usage exceeds a reasonable threshold for the complexity of the task. A "fix typo" prompt shouldn't cost $2.

Example: A simple validation bug fix burns $1.80 in tokens because the agent is reading every file in the project to "understand context."

Detection: Caret estimates expected cost based on the action type and scope size. When actual cost exceeds the expected cost by a configurable multiplier (default: 5x), an alert is triggered.

Destructive actions

Severity: Critical

The agent attempts to run commands or make changes that could cause irreversible damage.

Example: rm -rf src/, DROP TABLE users, git push --force origin main, deleting configuration files.

Detection: Pattern matching against a built-in list of dangerous commands and file operations. This is Tier 1 (rule-based) detection — it runs in under 10ms and has zero false negatives for known patterns.

Intervention: Destructive actions are blocked by default. The agent receives a message explaining why the action was blocked.

Approach drift

Severity: High

The agent takes an unnecessarily complex approach to a simple task. It rewrites entire modules when a one-line fix would do.

Example: You ask to "fix the date formatting bug." The agent creates a new DateFormatter class, refactors three components to use it, writes tests, and updates the documentation — for a one-character fix in a format string.

Detection: Caret compares the diff size and number of files modified against the expected complexity for the action type. A "fix" action with a 500-line diff across 10 files is flagged as approach drift.

Hallucination drift

Severity: High

The agent imports packages that don't exist, calls APIs with wrong signatures, or references files that aren't in the project.

Example: The agent runs pnpm add @utils/super-validator — a package that doesn't exist in the npm registry.

Detection: Caret cross-references package install commands against known registries and import statements against existing project files.

Detection tiers

🖼 Drift type comparison chart — six types with severity, detection method, and latency

All six drift types and their detection characteristics

Tier	Method	Latency	Cost	Drift types
1	Rule-based	<10ms	Free	Destructive actions, retry loops
2	Heuristics	<50ms	Free	Scope drift, cost runaway, approach drift
3	LLM evaluation	1-3s	~$0.01/eval	All types (highest accuracy)

Tiers 1 and 2 are always active. Tier 3 is optional and can be enabled per-detector in caret.config.json.

Was this helpful?