Skip to main content
← Back to Research

Agentic CLI Architecture Reference

January 3, 2025

Agentic CLI Architecture Reference

A comprehensive guide to building production-grade agentic command-line interfaces, derived from patterns in state-of-the-art implementations.


Core Architecture

Layered Design

┌─────────────────────────────────────────────────────────────┐
│                    EXTENSION LAYER                          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │ External │ │  Hooks   │ │  Skills  │ │ Plugins  │       │
│  │  Tools   │ │ (Events) │ │ (Prompts)│ │(Packages)│       │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘       │
├─────────────────────────────────────────────────────────────┤
│                   DELEGATION LAYER                          │
│         ┌─────────────────────────────────────────┐        │
│         │     Subagents (Parallel Execution)      │        │
│         │   Explorer | Planner | Specialist       │        │
│         └─────────────────────────────────────────┘        │
├─────────────────────────────────────────────────────────────┤
│                      CORE LAYER                             │
│    ┌─────────────────────────────────────────────────────┐ │
│    │           Main Agent Loop                           │ │
│    │   Context Window | Tool Executor | Message History  │ │
│    └─────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│                   FOUNDATION LAYER                          │
│    ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │
│    │   LLM    │ │ Session  │ │Permission│ │  Config  │    │
│    │ Provider │ │  Store   │ │  System  │ │  Loader  │    │
│    └──────────┘ └──────────┘ └──────────┘ └──────────┘    │
└─────────────────────────────────────────────────────────────┘

Execution Model: Single-Thread Simplicity

Prefer a single main agent loop over complex multi-agent orchestration.

┌─────────────────────────────────────────────────────────┐
│                    MAIN AGENT LOOP                      │
│                                                         │
│   User Input → LLM → Tool Call → Result → LLM → ...    │
│                          │                              │
│                          ▼                              │
│                   ┌─────────────┐                       │
│                   │  Subagent   │ (max 1 level deep)    │
│                   │  Execution  │                       │
│                   └──────┬──────┘                       │
│                          │                              │
│                   Result added to                       │
│                   main message history                  │
└─────────────────────────────────────────────────────────┘

Key Principles:

  • Flat message list as single source of truth
  • Subagents spawn with isolated context, return summarized results
  • Maximum one level of delegation (no recursive subagent spawning)
  • Results from subagents become tool responses in main thread

Tool System Design

Tool Hierarchy

Design tools at multiple abstraction levels:

| Level | Characteristics | Examples | |-------|----------------|----------| | Low-level | Direct system access, flexible but error-prone | bash, read_file, write_file | | Mid-level | Specialized, optimized for common operations | grep, glob, edit, multi_edit | | High-level | Orchestration, deterministic outcomes | spawn_agent, web_fetch, todo_list |

Why multiple levels?

  • Frequent operations deserve dedicated tools (reduces LLM errors)
  • Specialized tools have better prompts and validation
  • High-level tools save tokens and keep agent on track

Tool Categories

1. Filesystem Tools

read:
  description: "Read file contents with line numbers"
  parameters:
    - file_path: string (required)
    - offset: integer (optional, start line)
    - limit: integer (optional, max lines)
  risk_level: low

write:
  description: "Create or overwrite file"
  parameters:
    - file_path: string (required)
    - content: string (required)
  risk_level: high

edit:
  description: "Precise search-and-replace modification"
  parameters:
    - file_path: string (required)
    - old_text: string (required, must be unique)
    - new_text: string (required)
  risk_level: medium

multi_edit:
  description: "Batch edits in single operation"
  parameters:
    - file_path: string (required)
    - edits: array of {old_text, new_text}
  risk_level: medium

glob:
  description: "Find files by pattern"
  parameters:
    - pattern: string (required, e.g., "**/*.py")
    - path: string (optional, search root)
  risk_level: low

list_directory:
  description: "List directory contents"
  parameters:
    - path: string (required)
    - depth: integer (optional, default 1)
  risk_level: low

2. Search Tools

grep:
  description: "Content search using ripgrep"
  parameters:
    - pattern: string (required)
    - path: string (optional)
    - output_mode: enum [content, files_only, count]
    - file_type: string (optional, e.g., "py", "js")
    - context_lines: integer (optional)
  risk_level: low
  notes: "Always prefer dedicated grep over bash grep"

3. Execution Tools

bash:
  description: "Execute shell commands"
  parameters:
    - command: string (required)
    - timeout: integer (optional, ms, max 600000)
    - working_directory: string (optional)
    - run_in_background: boolean (optional)
  risk_level: high
  restrictions:
    - "Prefer dedicated tools over bash equivalents"
    - "Never use: cat, head, tail, grep, find, sed, awk"
    - "Quote paths with spaces"
    - "Use && or ; for command chaining, not newlines"

4. Web Tools

web_search:
  description: "Search the internet"
  parameters:
    - query: string (required)
    - max_results: integer (optional)
  risk_level: low

web_fetch:
  description: "Retrieve web page content"
  parameters:
    - url: string (required)
    - extract_text: boolean (optional)
  risk_level: low

5. Meta Tools

spawn_agent:
  description: "Delegate task to subagent"
  parameters:
    - task: string (required)
    - agent_type: enum [explorer, planner, general]
    - tools: array of strings (optional, tool subset)
  risk_level: low

todo_list:
  description: "Track task progress"
  parameters:
    - action: enum [read, write, update]
    - todos: array of {content, status}
  risk_level: low

Tool Design Best Practices

  1. Explicit over implicit: Require paths, don't assume cwd
  2. Validation in schema: Use JSON Schema for parameter validation
  3. Rich descriptions: Include examples, edge cases, and anti-patterns
  4. Atomic operations: Each tool does one thing well
  5. Idempotent when possible: Same input → same output
  6. Bounded outputs: Truncate long results (e.g., 30k chars)

Permission System

Permission Modes

modes:
  strict:
    description: "Confirm every action"
    auto_approve: []

  standard:
    description: "Approve reads, confirm writes"
    auto_approve: [read, glob, grep, list_directory, web_search]

  permissive:
    description: "Auto-approve safe operations"
    auto_approve: [read, glob, grep, list_directory, edit, web_*]

  autonomous:
    description: "No confirmations (dangerous)"
    auto_approve: ["*"]
    requires_flag: "--dangerously-skip-permissions"

Tool Allowlists

{
  "allowedTools": [
    "read",
    "glob",
    "grep",
    "bash(git:*)",
    "bash(npm:*)",
    "bash(pytest:*)"
  ],
  "disallowedTools": [
    "bash(rm -rf:*)",
    "bash(sudo:*)"
  ]
}

Pattern syntax:

  • tool_name - exact match
  • tool_name(prefix:*) - match commands starting with prefix
  • * - match all tools

Subagent Architecture

Built-in Agent Types

explorer:
  model: fast (e.g., Haiku, GPT-4o-mini)
  mode: read-only
  tools: [glob, grep, read, list_directory]
  parameters:
    thoroughness: [quick, medium, thorough]
  use_cases:
    - Codebase exploration
    - File discovery
    - Pattern searching

planner:
  model: capable (e.g., Sonnet, GPT-4o)
  mode: read-only
  tools: [read, glob, grep, bash(safe)]
  use_cases:
    - Implementation planning
    - Architecture decisions
    - Breaking down complex tasks

general:
  model: capable
  mode: full
  tools: all
  use_cases:
    - Complex research + modification
    - Multi-step workflows
    - Delegated implementations

Custom Agent Definition

# .agents/security-reviewer.yaml
name: security-reviewer
description: "Security-focused code reviewer"
model: opus  # or specific model string
permission_mode: plan  # read-only
tools:
  - read
  - grep
  - glob
  - bash(git log:*)
  - bash(git diff:*)
prompt: |
  You are a senior security engineer reviewing code.
  Focus on:
  - OWASP Top 10 vulnerabilities
  - Secrets and hardcoded credentials
  - SQL injection, XSS, CSRF
  - Authentication/authorization flaws

  Report findings with severity levels and remediation steps.

Subagent Invocation

# Programmatic
result = await spawn_agent(
    task="Find all authentication-related files",
    agent_type="explorer",
    thoroughness="thorough"
)

# Natural language (agent decides)
"Use the explorer agent to find all API endpoints"
"Have a subagent analyze the database schema"

Hook System

Event Types

lifecycle:
  - SessionStart      # Agent session begins
  - SessionEnd        # Agent session ends
  - Stop              # Agent completes task

tool_events:
  - PreToolUse        # Before tool execution
  - PostToolUse       # After successful execution
  - PostToolUseFailure # After failed execution
  - PermissionRequest # User permission needed

user_events:
  - UserPromptSubmit  # Before processing user input
  - Notification      # System notifications

Hook Configuration

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "bash",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/validate-bash.sh"
          }
        ]
      },
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/log-tool-use.sh"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "edit",
        "hooks": [
          {
            "type": "command",
            "command": "npm run lint --fix"
          }
        ]
      }
    ]
  }
}

Hook Input/Output

# Hook receives JSON on stdin
{
  "tool_name": "bash",
  "tool_input": {
    "command": "rm -rf /tmp/cache"
  },
  "session_id": "abc123",
  "conversation_id": "xyz789"
}

# Hook returns JSON on stdout
{
  "decision": "deny",  # or "approve", "passthrough"
  "reason": "Dangerous command blocked",
  "modified_input": null,  # optional: modify tool input
  "system_message": null   # optional: inject context
}

Common Hook Patterns

# Security validation
async def validate_bash(input_data):
    command = input_data["tool_input"].get("command", "")
    dangerous = ["rm -rf /", "sudo", "> /dev/sda"]
    if any(d in command for d in dangerous):
        return {"decision": "deny", "reason": "Dangerous command"}
    return {"decision": "passthrough"}

# Audit logging
async def log_all_tools(input_data):
    log.info(f"Tool: {input_data['tool_name']}, Input: {input_data['tool_input']}")
    return {}

# Auto-formatting after edits
async def post_edit_format(input_data):
    if input_data["tool_name"] == "edit":
        file_path = input_data["tool_input"]["file_path"]
        if file_path.endswith(".py"):
            subprocess.run(["black", file_path])
    return {}

Skills System

Concept

Skills are prompt injection modules that extend agent capabilities through specialized instructions, not code execution.

┌─────────────────────────────────────────────────────────┐
│                   SKILL ACTIVATION                      │
│                                                         │
│   User Request → Skill Matcher → Inject SKILL.md →     │
│                                                         │
│   → Agent now has specialized instructions/context      │
└─────────────────────────────────────────────────────────┘

Skill Definition

<!-- .skills/code-review/SKILL.md -->

---
name: code-review
description: "Thorough code review with best practices"
tools: Read,Grep,Glob
---

# Code Review Skill

When performing code reviews, follow this methodology:

## 1. Understanding Phase
- Read the changed files completely
- Identify the purpose of changes
- Check for related test files

## 2. Analysis Checklist
- [ ] Error handling present
- [ ] Edge cases considered
- [ ] No hardcoded secrets
- [ ] Logging appropriate
- [ ] Tests cover new code

## 3. Output Format
Provide feedback as:
- 🔴 Critical: Must fix before merge
- 🟡 Suggestion: Consider improving
- 🟢 Praise: Well done

Skill Discovery

project/
├── .skills/
│   ├── code-review/
│   │   └── SKILL.md
│   └── api-design/
│       └── SKILL.md
└── ~/.config/agent/skills/  # Global skills
    └── security-audit/
        └── SKILL.md

Plugin Architecture

Plugin Structure

my-plugin/
├── plugin.json           # Manifest (required)
├── commands/             # Slash commands
│   └── deploy.md
├── agents/               # Custom agents
│   └── devops.md
├── skills/               # Skills
│   └── kubernetes/
│       └── SKILL.md
├── hooks.json            # Hook definitions
└── mcp.json              # External tool servers

Plugin Manifest

{
  "name": "devops-toolkit",
  "version": "1.0.0",
  "description": "DevOps automation tools",
  "author": "Your Name",
  "components": {
    "commands": ["commands/*.md"],
    "agents": ["agents/*.md"],
    "skills": ["skills/*/SKILL.md"],
    "hooks": "hooks.json",
    "mcp": "mcp.json"
  }
}

Plugin Loading

# Programmatic
agent = Agent(
    plugins=[
        {"type": "local", "path": "./my-plugin"},
        {"type": "local", "path": "~/.agent/plugins/shared"}
    ]
)

# CLI
agent --plugin-dir ./my-plugin --plugin-dir ~/.agent/plugins/shared

External Tool Protocol (MCP)

Overview

Model Context Protocol enables connecting external services as tools.

┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│   Agent     │ ←──→ │ MCP Client  │ ←──→ │ MCP Server  │
│   Core      │      │             │      │  (External) │
└─────────────┘      └─────────────┘      └─────────────┘
                                                │
                            ┌───────────────────┼───────────────────┐
                            │                   │                   │
                            ▼                   ▼                   ▼
                      ┌──────────┐       ┌──────────┐       ┌──────────┐
                      │ Database │       │   API    │       │  Browser │
                      └──────────┘       └──────────┘       └──────────┘

Transport Types

stdio:
  description: "Spawn process, communicate via stdin/stdout"
  config:
    command: "npx"
    args: ["@mcp/server-filesystem"]
    env:
      ALLOWED_PATHS: "/home/user/projects"

sse:
  description: "Server-Sent Events over HTTP"
  config:
    url: "https://api.example.com/mcp/sse"
    headers:
      Authorization: "Bearer ${API_TOKEN}"

http:
  description: "Standard HTTP requests"
  config:
    url: "https://api.example.com/mcp"
    headers:
      X-API-Key: "${API_KEY}"

MCP Configuration

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["@mcp/server-filesystem"],
      "env": {
        "ALLOWED_PATHS": "/home/user/projects"
      }
    },
    "database": {
      "type": "sse",
      "url": "http://localhost:3001/mcp",
      "headers": {
        "Authorization": "Bearer ${DB_TOKEN}"
      }
    },
    "jira": {
      "command": "python",
      "args": ["-m", "mcp_jira"],
      "env": {
        "JIRA_URL": "${JIRA_URL}",
        "JIRA_TOKEN": "${JIRA_TOKEN}"
      }
    }
  }
}

Tool Naming Convention

MCP tools follow: mcp__{server_name}__{tool_name}

allowed_tools = [
    "mcp__filesystem__read_file",
    "mcp__database__query",
    "mcp__jira__create_issue"
]

Context Management

The Context Window Problem

┌─────────────────────────────────────────────────────────┐
│                  CONTEXT WINDOW                         │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │ System Prompt                            ~2-5k  │   │
│  ├─────────────────────────────────────────────────┤   │
│  │ Tool Definitions                         ~3-8k  │   │
│  ├─────────────────────────────────────────────────┤   │
│  │ Project Context (CLAUDE.md, etc.)        ~1-5k  │   │
│  ├─────────────────────────────────────────────────┤   │
│  │ Conversation History                  VARIABLE  │   │
│  │   - User messages                               │   │
│  │   - Assistant responses                         │   │
│  │   - Tool calls and results                      │   │
│  ├─────────────────────────────────────────────────┤   │
│  │ Current Turn                           VARIABLE │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  Total limit: 128k-200k tokens (model dependent)       │
└─────────────────────────────────────────────────────────┘

Compaction Strategies

class ContextManager:
    def __init__(self, max_tokens: int = 100000):
        self.max_tokens = max_tokens
        self.compaction_threshold = 0.8  # 80%

    def should_compact(self, current_tokens: int) -> bool:
        return current_tokens > self.max_tokens * self.compaction_threshold

    def compact(self, messages: list) -> list:
        """Reduce context size while preserving critical information."""
        strategies = [
            self.truncate_tool_outputs,    # Limit long outputs
            self.summarize_old_turns,      # Summarize distant history
            self.remove_redundant_reads,   # Remove duplicate file reads
            self.compress_to_summary       # Last resort: full summarization
        ]

        for strategy in strategies:
            messages = strategy(messages)
            if self.count_tokens(messages) < self.max_tokens * 0.6:
                break

        return messages

Practical Techniques

  1. Truncate tool outputs: Limit to 30k chars, show head + tail
  2. Deduplicate file reads: Keep only latest version
  3. Summarize old turns: Compress turns older than N
  4. Subagent isolation: Subagents get fresh context, return summaries
  5. Prompt caching: Cache static portions (system prompt, tools)

CLI Interface Design

Command Structure

# Basic usage
agent                           # Interactive REPL
agent "query"                   # Start with prompt
agent -p "query"                # Non-interactive (print mode)
agent -c                        # Continue last session
agent -r <session-id>           # Resume specific session

# Piping
cat file.log | agent -p "analyze errors"
git diff | agent -p "review changes"

# Configuration
agent --model sonnet            # Model selection
agent --permission-mode strict  # Permission mode
agent --max-turns 10            # Limit iterations
agent --timeout 300             # Global timeout (seconds)

# Extensions
agent --mcp-config ./mcp.json   # Load MCP servers
agent --plugin-dir ./plugins    # Load plugins
agent --agents '{"name": {...}}'  # Define subagents

# System prompt
agent --system-prompt "You are..."        # Replace
agent --append-system-prompt "Also..."    # Append
agent --system-prompt-file ./prompt.txt   # From file

# Output control
agent -p --output-format json   # JSON output
agent -p --output-format stream-json  # Streaming JSON
agent --verbose                 # Detailed logging
agent --debug "api,tools"       # Debug categories

Interactive Commands (Slash Commands)

/help                    Show available commands
/clear                   Reset conversation
/compact                 Compress context manually
/model <name>            Switch model
/permissions             Manage tool permissions
/sessions                List saved sessions
/resume <id>             Resume session
/save                    Save current session
/config                  View/edit configuration
/bug                     Report issue
/quit                    Exit

Custom Slash Commands

<!-- .commands/deploy.md -->
Deploy the current project to $ARGUMENTS environment.

1. Run tests: `npm test`
2. Build: `npm run build`
3. Deploy: `./scripts/deploy.sh $ARGUMENTS`
4. Verify deployment health
5. Report status

If any step fails, stop and report the error.

Usage: /project:deploy staging


Configuration Hierarchy

Precedence (highest to lowest)

1. CLI flags           --model opus
2. Environment vars    AGENT_MODEL=opus
3. Local config        ./.agent/settings.json
4. Project config      ./agent.config.json
5. User config         ~/.config/agent/settings.json
6. System defaults     Built-in values

Configuration File

{
  "model": "sonnet",
  "fallback_model": "haiku",
  "permission_mode": "standard",
  "max_turns": 50,
  "timeout_ms": 300000,

  "tools": {
    "allowed": ["read", "glob", "grep", "bash(git:*)"],
    "disallowed": ["bash(rm -rf:*)"]
  },

  "context": {
    "max_tokens": 100000,
    "compaction_threshold": 0.8
  },

  "hooks": {
    "PreToolUse": [...]
  },

  "mcp_servers": {
    "filesystem": {...}
  },

  "output": {
    "format": "text",
    "verbose": false,
    "color": true
  }
}

Project Context File

<!-- AGENT.md or CLAUDE.md -->

# Project: My Application

## Overview
This is a Next.js application with a Python backend.

## Architecture
- Frontend: Next.js 14, TypeScript, Tailwind
- Backend: FastAPI, PostgreSQL
- Infrastructure: Docker, Kubernetes

## Conventions
- Use TypeScript strict mode
- Follow PEP 8 for Python
- All functions require docstrings
- Tests required for new features

## Commands
- `npm run dev` - Start frontend
- `uvicorn main:app --reload` - Start backend
- `pytest` - Run tests
- `npm run lint && black .` - Format code

## File Structure
- `src/` - Frontend source
- `api/` - Backend source
- `tests/` - Test files
- `docs/` - Documentation

Session Management

Session Persistence

@dataclass
class Session:
    id: str
    created_at: datetime
    updated_at: datetime
    working_directory: str
    messages: list[Message]
    tool_states: dict  # Persistent tool state
    checkpoints: list[Checkpoint]
    metadata: dict

class SessionStore:
    def save(self, session: Session) -> None:
        """Persist session to disk."""
        path = self.sessions_dir / f"{session.id}.json"
        path.write_text(session.to_json())

    def load(self, session_id: str) -> Session:
        """Load session from disk."""
        path = self.sessions_dir / f"{session_id}.json"
        return Session.from_json(path.read_text())

    def list_recent(self, limit: int = 10) -> list[SessionSummary]:
        """List recent sessions."""
        sessions = sorted(
            self.sessions_dir.glob("*.json"),
            key=lambda p: p.stat().st_mtime,
            reverse=True
        )
        return [self._summarize(s) for s in sessions[:limit]]

Checkpointing

class Checkpoint:
    """Snapshot of agent state at a point in time."""
    id: str
    session_id: str
    timestamp: datetime
    message_index: int
    file_snapshots: dict[str, str]  # path -> content hash
    description: str

def create_checkpoint(session: Session, description: str) -> Checkpoint:
    """Create restorable checkpoint."""
    return Checkpoint(
        id=generate_id(),
        session_id=session.id,
        timestamp=datetime.now(),
        message_index=len(session.messages),
        file_snapshots=snapshot_working_files(session.working_directory),
        description=description
    )

def restore_checkpoint(checkpoint: Checkpoint) -> Session:
    """Restore session to checkpoint state."""
    session = load_session(checkpoint.session_id)
    session.messages = session.messages[:checkpoint.message_index]
    restore_files(checkpoint.file_snapshots)
    return session

SDK Design

Core API

from agent_sdk import Agent, AgentOptions, Message

# Streaming execution
async for message in Agent.query(
    prompt="Fix the bug in auth.py",
    options=AgentOptions(
        model="sonnet",
        allowed_tools=["read", "edit", "bash(pytest:*)"],
        permission_mode="accept_edits",
        system_prompt="You are a Python expert.",
        working_directory="/path/to/project",
        mcp_servers={"db": {...}},
        hooks={"PreToolUse": [validate_hook]}
    )
):
    match message:
        case Message(type="assistant"):
            print(message.content)
        case Message(type="tool_use"):
            print(f"Using {message.tool_name}")
        case Message(type="tool_result"):
            print(f"Result: {message.result[:100]}")
        case Message(type="result", subtype="success"):
            print("Task completed!")

# Single-turn execution
result = await Agent.run(
    prompt="List all Python files",
    options=AgentOptions(allowed_tools=["glob"])
)
print(result.output)

Message Types

@dataclass
class Message:
    type: Literal[
        "system",        # System events (init, error)
        "user",          # User input
        "assistant",     # Agent response
        "tool_use",      # Tool invocation
        "tool_result",   # Tool output
        "result"         # Final result
    ]
    subtype: str | None  # e.g., "success", "error", "init"
    content: Any
    metadata: dict

Custom Tool Definition

from agent_sdk import tool, ToolResult

@tool(
    name="query_database",
    description="Execute SQL query against the database",
    parameters={
        "query": {"type": "string", "description": "SQL query to execute"},
        "database": {"type": "string", "description": "Database name"}
    }
)
async def query_database(query: str, database: str) -> ToolResult:
    try:
        result = await db.execute(query, database)
        return ToolResult.success(result.to_json())
    except DatabaseError as e:
        return ToolResult.error(f"Query failed: {e}")

Error Handling

Error Categories

class AgentError(Exception):
    """Base agent error."""
    pass

class ToolExecutionError(AgentError):
    """Tool failed to execute."""
    tool_name: str
    input: dict
    cause: Exception

class PermissionDeniedError(AgentError):
    """User denied permission."""
    tool_name: str
    input: dict

class ContextOverflowError(AgentError):
    """Context window exceeded."""
    current_tokens: int
    max_tokens: int

class ModelError(AgentError):
    """LLM provider error."""
    provider: str
    status_code: int
    message: str

class TimeoutError(AgentError):
    """Operation timed out."""
    operation: str
    timeout_ms: int

Retry Strategies

class RetryPolicy:
    max_retries: int = 3
    backoff_base: float = 1.0
    backoff_max: float = 30.0
    retryable_errors: set = {
        "rate_limit",
        "overloaded",
        "timeout",
        "connection_error"
    }

    def should_retry(self, error: AgentError, attempt: int) -> bool:
        if attempt >= self.max_retries:
            return False
        return error.code in self.retryable_errors

    def get_delay(self, attempt: int) -> float:
        delay = self.backoff_base * (2 ** attempt)
        return min(delay, self.backoff_max)

Observability

Logging

import structlog

logger = structlog.get_logger()

# Tool execution
logger.info("tool_execution",
    tool=tool_name,
    input=tool_input,
    duration_ms=duration,
    success=True
)

# LLM call
logger.info("llm_call",
    model=model_name,
    input_tokens=input_tokens,
    output_tokens=output_tokens,
    duration_ms=duration,
    cached_tokens=cached_tokens
)

# Session events
logger.info("session_event",
    event="created" | "resumed" | "completed" | "error",
    session_id=session_id,
    duration_total_ms=total_duration
)

Metrics

from prometheus_client import Counter, Histogram

tool_calls = Counter(
    "agent_tool_calls_total",
    "Total tool calls",
    ["tool_name", "status"]
)

tool_duration = Histogram(
    "agent_tool_duration_seconds",
    "Tool execution duration",
    ["tool_name"]
)

llm_tokens = Counter(
    "agent_llm_tokens_total",
    "Total LLM tokens",
    ["model", "type"]  # type: input, output, cached
)

session_duration = Histogram(
    "agent_session_duration_seconds",
    "Session duration"
)

Cost Tracking

@dataclass
class UsageTracker:
    input_tokens: int = 0
    output_tokens: int = 0
    cached_tokens: int = 0
    tool_calls: int = 0

    def add_llm_call(self, response: LLMResponse):
        self.input_tokens += response.usage.input_tokens
        self.output_tokens += response.usage.output_tokens
        self.cached_tokens += response.usage.cached_tokens

    def estimate_cost(self, pricing: dict) -> float:
        input_cost = self.input_tokens * pricing["input_per_token"]
        output_cost = self.output_tokens * pricing["output_per_token"]
        cached_cost = self.cached_tokens * pricing["cached_per_token"]
        return input_cost + output_cost + cached_cost

Security Considerations

Sandboxing

# Docker-based isolation
docker_config:
  image: "agent-sandbox:latest"
  network: none  # or restricted
  read_only_root: true
  volumes:
    - source: /project
      target: /workspace
      read_only: false
    - source: /home/user/.agent
      target: /config
      read_only: true
  resource_limits:
    memory: 4g
    cpu: 2
    pids: 100
  security_opts:
    - no-new-privileges
    - seccomp=restricted.json

Input Validation

def validate_tool_input(tool_name: str, input: dict) -> ValidationResult:
    """Validate tool input before execution."""
    schema = get_tool_schema(tool_name)

    # JSON Schema validation
    errors = jsonschema.validate(input, schema)
    if errors:
        return ValidationResult.invalid(errors)

    # Path traversal check
    if "path" in input or "file_path" in input:
        path = input.get("path") or input.get("file_path")
        if not is_within_workspace(path):
            return ValidationResult.invalid("Path traversal detected")

    # Command injection check
    if tool_name == "bash":
        if contains_injection(input["command"]):
            return ValidationResult.invalid("Potential command injection")

    return ValidationResult.valid()

Secret Management

# Environment variable expansion (safe)
def expand_env_vars(config: dict) -> dict:
    """Expand ${VAR} patterns in config."""
    def expand(value: str) -> str:
        pattern = r'\$\{([^}]+)\}'
        def replace(match):
            var_name = match.group(1)
            default = None
            if ":-" in var_name:
                var_name, default = var_name.split(":-", 1)
            return os.environ.get(var_name, default or "")
        return re.sub(pattern, replace, value)

    return deep_map(config, expand)

Performance Optimization

Prompt Caching

class PromptCache:
    """Cache static prompt components."""

    def __init__(self):
        self.cache = {}

    def get_cached_prefix(self, components: list[str]) -> CachedPrefix:
        """Get or create cached prefix from components."""
        key = hash(tuple(components))
        if key not in self.cache:
            self.cache[key] = self._create_prefix(components)
        return self.cache[key]

    def build_messages(
        self,
        system_prompt: str,
        tool_definitions: list,
        project_context: str,
        conversation: list[Message]
    ) -> list[Message]:
        """Build messages with cached prefix."""
        # Static components get cached
        prefix = self.get_cached_prefix([
            system_prompt,
            json.dumps(tool_definitions),
            project_context
        ])

        # Dynamic components appended
        return prefix + conversation

Parallel Tool Execution

async def execute_tools_batch(tool_calls: list[ToolCall]) -> list[ToolResult]:
    """Execute independent tool calls in parallel."""

    # Group by dependency
    independent = [t for t in tool_calls if not t.depends_on]
    dependent = [t for t in tool_calls if t.depends_on]

    # Execute independent calls in parallel
    results = await asyncio.gather(*[
        execute_tool(t) for t in independent
    ])

    # Execute dependent calls sequentially
    for tool_call in dependent:
        result = await execute_tool(tool_call)
        results.append(result)

    return results

Streaming Output

async def stream_response(query: str, options: AgentOptions):
    """Stream agent responses for real-time display."""

    buffer = ""
    async for message in Agent.query(query, options):
        if message.type == "assistant":
            # Stream text token by token
            for token in message.content:
                yield token
                buffer += token

        elif message.type == "tool_use":
            yield f"\n[Using {message.tool_name}...]\n"

        elif message.type == "tool_result":
            if options.verbose:
                yield f"[Result: {message.result[:200]}...]\n"

Testing Strategies

Unit Testing Tools

import pytest
from agent_sdk.testing import MockLLM, MockToolExecutor

@pytest.fixture
def mock_agent():
    return Agent(
        llm=MockLLM(responses=[
            {"tool_use": {"name": "read", "input": {"path": "test.py"}}},
            {"text": "The file contains a function definition."}
        ]),
        tool_executor=MockToolExecutor({
            "read": lambda input: "def hello(): pass"
        })
    )

async def test_file_analysis(mock_agent):
    result = await mock_agent.run("Analyze test.py")
    assert "function" in result.output
    assert mock_agent.tool_calls == [("read", {"path": "test.py"})]

Integration Testing

@pytest.mark.integration
async def test_edit_workflow():
    """Test complete edit workflow in isolated environment."""

    with TempWorkspace() as workspace:
        # Setup
        workspace.write("src/main.py", "def old_name(): pass")

        # Execute
        agent = Agent(working_directory=workspace.path)
        await agent.run("Rename old_name to new_name in src/main.py")

        # Verify
        content = workspace.read("src/main.py")
        assert "def new_name():" in content
        assert "old_name" not in content

Snapshot Testing

def test_tool_schema_stability():
    """Ensure tool schemas don't change unexpectedly."""
    tools = get_all_tool_definitions()

    for tool in tools:
        snapshot_path = f"snapshots/tools/{tool.name}.json"
        current = tool.to_json()

        if os.path.exists(snapshot_path):
            expected = json.load(open(snapshot_path))
            assert current == expected, f"Tool {tool.name} schema changed"
        else:
            json.dump(current, open(snapshot_path, "w"))

Deployment Patterns

Docker Container

FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    ripgrep \
    && rm -rf /var/lib/apt/lists/*

# Install agent
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Create non-root user
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /home/agent

# Copy configuration
COPY --chown=agent:agent config/ /home/agent/.config/agent/

ENTRYPOINT ["agent"]
CMD ["--help"]

CI/CD Integration

# GitHub Actions example
name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get changed files
        id: changes
        run: |
          echo "files=$(git diff --name-only origin/main...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT

      - name: Run AI Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          agent -p "Review these changed files for bugs and improvements: ${{ steps.changes.outputs.files }}" \
            --output-format json \
            --max-turns 20 \
            > review.json

      - name: Post Review Comment
        uses: actions/github-script@v7
        with:
          script: |
            const review = require('./review.json');
            github.rest.pulls.createReview({
              ...context.repo,
              pull_number: context.issue.number,
              body: review.output,
              event: 'COMMENT'
            });

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agent
  template:
    metadata:
      labels:
        app: agent
    spec:
      containers:
        - name: agent
          image: my-agent:latest
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "2000m"
          env:
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: api-keys
                  key: anthropic
          volumeMounts:
            - name: workspace
              mountPath: /workspace
      volumes:
        - name: workspace
          emptyDir: {}

Appendix: Decision Checklist

When to Use Dedicated Tools vs Bash

| Use Dedicated Tool | Use Bash | |-------------------|----------| | File reading (read) | Git operations | | File editing (edit) | Package management (npm, pip) | | Pattern search (grep) | Running tests | | File finding (glob) | Build commands | | Directory listing (ls) | Custom scripts |

When to Spawn Subagents

| Spawn Subagent | Handle in Main Loop | |----------------|---------------------| | Complex research tasks | Simple file operations | | Exploration of large codebases | Direct answers | | Parallel independent tasks | Sequential dependent tasks | | Isolated experimental changes | Standard workflows |

Permission Mode Selection

| Mode | Use Case | |------|----------| | Strict | Untrusted environments, production systems | | Standard | Normal development work | | Permissive | Trusted projects, experienced users | | Autonomous | Automated pipelines, sandboxed environments |


This document provides architectural patterns for building agentic CLI tools. Implementations will vary based on specific requirements, target LLM providers, and use cases.