AUTONOMESS // AI Adversary Simulation

Prerequisites

Autonomess targets Python 3.11. The runtime is claude-agent-sdk 0.1.58; no other model providers are supported.

Python 3.11 (pinned; tested on 3.12/3.13)
uv 0.11+ for project + virtualenv management
Obsidian 1.12+ with the official command-line interface enabled
Pentest CLIs available on $PATH: nmap, nikto, sqlmap, ffuf, nuclei, gobuster
stdbuf (coreutils on Linux; brew install coreutils on macOS)
notebooklm-py CLI authenticated (research integration)
An ANTHROPIC_API_KEY in .env

First Mission

Drop a scope file in your working directory and launch the architect. The scope contract — allow-list of targets, exclusions, policy bounds, optional notification webhook — is loaded once at boot and enforced on every tool call by the scope guard.

targets

ALLOW-LIST

CIDR · hostname · URL · wildcards

excluded

CARVE-OUTS

protected hosts inside the range

policy

BOUNDS

destructive ops · DoS · concurrency · runtime

notify

CALLBACKS

webhooks fired on breach + completion

refuse-by-default · enforced at the PreToolUse hook operator-only · the Architect cannot mutate scope

EXPECTEDThe TUI launches with the Architect pane on the left, six sub-agent panes on the right, and a vault tail at the bottom. Every action streams as [agent-name] verb.

CLI Reference

Command	Purpose
`autonomess run`	Launch a mission with the Architect agent.
`autonomess vault doctor`	Validate vault structure, locks, write queue, and Obsidian connectivity.
`autonomess vault init <path>`	Bootstrap a fresh runtime vault (zones, indexes, templates).
`autonomess vault tail`	Stream live vault writes to stdout (NDJSON).
`autonomess scope check <target>`	Test whether a target would be allowed by the current scope file.
`autonomess --version`	Print build + SDK versions.

Common `run` flags

Flag	Meaning
`--scope PATH`	Required. Path to `scope.yaml`.
`--mission TEXT`	Required. Natural-language mission objective.
`--vault PATH`	Override runtime vault path.
`--no-research`	Disable NotebookLM research dispatch.
`--max-agents N`	Override sub-agent capacity (default: 6).
`--dry-run`	Plan only — no tool calls hit the wire.

scope.yaml

Refuse-by-default. If a target isn't on the allow-list, it cannot be reached, ever — the scope guard intercepts at the PreToolUse hook before any subprocess fires. The contract is the operator's instrument; the Architect cannot mutate it mid-mission.

SECTION 01

targets

The reachable surface. CIDR ranges, hostnames, FQDNs, URL prefixes, wildcards. Nothing outside this set survives the guard.

SECTION 02

excluded

Surgical carve-outs from the allow-list. Production gateways, executive endpoints, regulated assets — exempted explicitly.

SECTION 03

policy

Behavioral envelope. Toggle destructive operations, denial-of-service tooling, max concurrent agents, max runtime. Bounded autonomy is the only autonomy.

SECTION 04

notify

Callback channels. Webhook URLs fired on breach simulation, scope violation attempts, and mission completion. Wire these into your SIEM.

Vault Tools

Memory is layered and promoted explicitly. Sub-agents land raw evidence; the Architect promotes only verified knowledge into the operator-facing zones. Every write passes through a single-writer queue with a file lock — concurrent agents cannot corrupt the vault.

RAW

sub-agents

unfiltered tool output

WORKBENCH

sub-agents

in-flight analysis

WIKI

architect

verified knowledge

OUTPUT

architect

operator deliverables

VAULT WRITER

single-writer · filelock

doctor · health

init · bootstrap

tail · live stream

promote · zone shift

Architecture

The system is composed of an Architect agent that orchestrates two dispatch shapes — independent Sub-Agents and collaborative Agent Teams — over a transparent event bus, with all memory persisted to an Obsidian vault behind a single-writer queue.

┌─────────────────────────────────────────────────────────┐
│                     ARCHITECT (Claude)                  │
│         plans · dispatches · arbitrates · reports       │
└──────────┬──────────────────────────┬───────────────────┘
           │ Agent tool               │ Agent tool
           ▼                          ▼
   ┌───────────────┐          ┌─────────────────┐
   │ SUB-AGENTS ≤6 │          │ AGENT TEAMS ≤4  │
   │ parallel,     │          │ collaborative,  │
   │ isolated ctx  │          │ shared via vault│
   └───────┬───────┘          └────────┬────────┘
           │                           │
           ▼                           ▼
   ┌─────────────────────────────────────────────┐
   │ PreToolUse / PostToolUse hooks              │
   │  → ScopeGuard refuse-by-default             │
   │  → StreamEvent → Textual RichLog panes      │
   └─────────────┬───────────────────────────────┘
                 ▼
        ┌─────────────────┐       ┌──────────────┐
        │   VaultWriter   │──────▶│ Obsidian CLI │
        │ (single writer) │       │   1.12+      │
        └─────────────────┘       └──────────────┘

Architect Agent

The Architect is the only agent with the Agent tool. It owns mission decomposition, dispatch decisions, and final reporting. It does not run pentest tools directly — it orchestrates specialists who do.

Model: claude-opus-4.7 — Anthropic's most capable model. Mythos-class adversaries run on frontier reasoning; the Architect must match that weight class to plan against them.
Inputs: mission objective, scope, vault head pointer, prior-loop summary
Outputs: agent dispatches, mission report, vault writes via the librarian

Sub-Agents

Up to six concurrent. Each receives a fresh 200K-token context and a single, narrow objective. Only the final message bubbles back to the Architect — intermediate reasoning lives in the vault and the live stream.

FRONTIER-MODEL PARITYSub-agents are routed by task weight: claude-opus-4.7 for adversary simulation and exploit reasoning, claude-sonnet-4.6 for enumeration and reporting, claude-haiku-4.5 for fast hooks like the scope guard. We refuse to ship a defender that thinks slower than the attacker.

Default personas:

recon-agent — surface mapping, port + service discovery
enum-agent — credentialed/uncredentialed enumeration
scan-agent — vulnerability scanning, CVE lookup
research-agent — NotebookLM dispatch, attack-pattern lookup
librarian-agent — vault dedup, promotion, index maintenance

Agent Teams

Teams are the mechanism for depth work — exploitation, drill-down, multi-step attack chains. The SDK has no first-class team primitive, so Autonomess simulates one through the vault: each member reads/writes a shared wiki/attack-chains/chain-X.md note. The librarian enforces dedup.

WHY NOT NESTED DISPATCHThe SDK disables sub-agents spawning sub-agents. Teams therefore coordinate via the vault, not via nested Agent calls.

Transparency Pipeline

Every tool call passes through PreToolUse and PostToolUse hooks. The hooks emit a structured event tagged with the originating agent_id and its parent dispatch, which the Textual app routes to the correct pane via post_message.

AGENTopus-4.7 · architect

tool_use

HOOKPreToolUse · PostToolUse

agent_starttool_result

DASHBOARDRichLog · per-agent pane

kind · classified event type agent_id · routing key parent_id · dispatch chain summary · one-liner

Vault System (Karpathy-RAG)

Memory is layered. Information is born in raw/, distilled in workbench/, promoted to wiki/ only when verified, and exported to output/ for the operator.

Zone	Writers	Purpose
`raw/`	Sub-agents	Unfiltered tool output — scan results, banners, payload responses
`workbench/`	Sub-agents	In-flight analysis, hypotheses, intermediate reasoning
`wiki/`	Architect (via librarian)	Verified knowledge — attack chains, host profiles, CVE notes
`output/`	Architect	Mission deliverables — report, executive summary, recommendations

CRITICALAll writes route through VaultWriter — a single-writer queue holding a filelock. Direct file I/O on vault paths is banned at lint time (ast-grep scan).

Scope Guard

Refuse-by-default. The guard hooks PreToolUse, parses the tool's target argument (nmap's host list, curl's URL, etc.), and blocks if the target is not in scope. The Architect cannot override.

tool_callnmap · curl · sqlmap

SCOPE GUARD

PRE-TOOL-USE

→ vuln-lab.local ALLOW

→ corp-prod.example.com DENY · out of scope

→ 10.10.50.0/24 ALLOW

→ 8.8.8.8 DENY · refuse-by-default

NotebookLM Research

Any agent can dispatch a research-agent mid-mission. It shells out to notebooklm-py, asks a focused question, and writes the answer to wiki/research/{topic}.md via the librarian. Subsequent agents read the wiki entry instead of re-querying.

Boot Sequence

Parse CLI args, load .env
Bind structlog contextvars (mission_id, loop)
Load + validate scope.yaml
Spawn VaultWriter background task; acquire vault lock
Run vault doctor — abort on FAIL
Build ClaudeAgentOptions (hooks, MCP servers, agent registry)
Launch Textual TUI; subscribe panes to event bus
Architect receives mission objective; loop begins

Mission Loop

Plan — Architect inspects vault head, drafts next step
Dispatch — Architect calls Agent tool, spawning specialists
Execute — Sub-agents/teams run tools (scope-checked, streamed)
Persist — Findings flow into raw/ and workbench/
Distill — Librarian promotes verified facts to wiki/
Decide — Architect evaluates: continue, pivot, or report
Loop until objective met or budget exhausted

Tech Stack

Layer	Choice
Runtime	Python 3.11
Agent SDK	`claude-agent-sdk 0.1.58`
Concurrency	`anyio 4.13` · TaskGroup + CapacityLimiter
Project mgmt	`uv` 0.11
TUI	`textual` 8 + `rich` 15
Logging	`structlog` 25 (JSON + Rich sinks)
Lint / type	`ruff` + `mypy --strict`
Tests	`pytest 9` + `pytest-asyncio`
Vault I/O	Official Obsidian 1.12 CLI via `/obs` skill
Research	`notebooklm-py` CLI

Build Phases

Foundation

Repo skeleton · uv · ruff · mypy · CI gates

Vault

VaultWriter · zones · obsidian-cli adapter · lint rule

Transparency

Hooks · StreamEvent · Textual TUI · RichLog routing

Architect

Mission planner · agent registry · MCP servers

Sub-Agents

Recon/enum/scan personas · capacity limiter

Research

notebooklm-py integration · librarian dedup

Tools

nmap/sqlmap/ffuf wrappers · stdbuf streaming

Teams

Shared-vault collab · attack-chain notes

E2E

Full mission against vuln-lab · report generation

Troubleshooting

Symptom	Resolution
`vault doctor` exit 2 — Obsidian not running	Launch Obsidian; ensure CLI is enabled in Settings → CLI.
`vault doctor` exit 3 — lock held	Stale lock. Remove `.vault-lock` only if no autonomess process is running.
Sub-agent stream silent	Confirm `include_partial_messages=True` in `ClaudeAgentOptions`.
Pentest tool buffers output	Wrap with `stdbuf -o0 -e0` (or `gstdbuf` on macOS).
Scope guard blocks valid target	Run `autonomess scope check <target>`; verify CIDR/hostname pattern.
NotebookLM research times out	Re-auth: `notebooklm login`. Idempotent retry will pick up partial imports.

Claude Mythos
doesn't sleep.
Neither does your defense.

The attacker isn't a person
anymore. It's a model.

Always-on enumeration

Adaptive payload synthesis

Memory across breaches

Hire an AI red team. Before someone else's hires you.

One Architect.
An army of specialists.

≤6 parallel · isolated

≤4 collaborative · shared

Single writer · audit trail

Prerequisites

First Mission

CLI Reference

Common `run` flags

scope.yaml

targets

excluded

policy

notify

Vault Tools

Architecture

Architect Agent

Sub-Agents

Agent Teams

Transparency Pipeline

Vault System (Karpathy-RAG)

Scope Guard

NotebookLM Research

Boot Sequence

Mission Loop

Tech Stack

Build Phases

Troubleshooting

The next breach
is already being planned.

Claude Mythos doesn't sleep. Neither does your defense.

The attacker isn't a personanymore. It's a model.

Always-on enumeration

Adaptive payload synthesis

Memory across breaches

Hire an AI red team. Before someone else's hires you.

One Architect.An army of specialists.

≤6 parallel · isolated

≤4 collaborative · shared

Single writer · audit trail

Prerequisites

First Mission

CLI Reference

Common run flags

scope.yaml

targets

excluded

policy

notify

Vault Tools

Architecture

Architect Agent

Sub-Agents

Agent Teams

Transparency Pipeline

Vault System (Karpathy-RAG)

Scope Guard

NotebookLM Research

Boot Sequence

Mission Loop

Tech Stack

Build Phases

Troubleshooting

The next breachis already being planned.

Claude Mythos
doesn't sleep.
Neither does your defense.

The attacker isn't a person
anymore. It's a model.

One Architect.
An army of specialists.

Common `run` flags

The next breach
is already being planned.