Agentic CLI Architecture Reference | Research

A comprehensive guide to building production-grade agentic command-line interfaces, derived from patterns in state-of-the-art implementations.

Core Architecture

Layered Design

Loading diagram...

Execution Model: Single-Thread Simplicity

Prefer a single main agent loop over complex multi-agent orchestration.

Loading diagram...

Key Principles:

Flat message list as single source of truth
Subagents spawn with isolated context, return summarized results
Maximum one level of delegation (no recursive subagent spawning)
Results from subagents become tool responses in main thread

Tool System Design

Tool Hierarchy

Design tools at multiple abstraction levels:

Level	Characteristics	Examples
Low-level	Direct system access, flexible but error-prone	`bash`, `read_file`, `write_file`
Mid-level	Specialized, optimized for common operations	`grep`, `glob`, `edit`, `multi_edit`
High-level	Orchestration, deterministic outcomes	`spawn_agent`, `web_fetch`, `todo_list`

Why multiple levels?

Frequent operations deserve dedicated tools (reduces LLM errors)
Specialized tools have better prompts and validation
High-level tools save tokens and keep agent on track

Tool Categories

1. Filesystem Tools

Loading diagram...

Skill Definition

<!-- .skills/code-review/SKILL.md -->

---

name: code-review
description: "Thorough code review with best practices"
tools: Read,Grep,Glob

---

# Code Review Skill

When performing code reviews, follow this methodology:

## 1. Understanding Phase

- Read the changed files completely
- Identify the purpose of changes
- Check for related test files

## 2. Analysis Checklist

- [ ] Error handling present
- [ ] Edge cases considered
- [ ] No hardcoded secrets
- [ ] Logging appropriate
- [ ] Tests cover new code

## 3. Output Format

Provide feedback as:

- 🔴 Critical: Must fix before merge
- 🟡 Suggestion: Consider improving
- 🟢 Praise: Well done

````text

### Skill Discovery

```text
project/
├── .skills/
│   ├── code-review/
│   │   └── SKILL.md
│   └── api-design/
│       └── SKILL.md
└── ~/.config/agent/skills/  # Global skills
    └── security-audit/
        └── SKILL.md
```text

---

## Plugin Architecture

### Plugin Structure

```text
my-plugin/
├── plugin.json           # Manifest (required)
├── commands/             # Slash commands
│   └── deploy.md
├── agents/               # Custom agents
│   └── devops.md
├── skills/               # Skills
│   └── kubernetes/
│       └── SKILL.md
├── hooks.json            # Hook definitions
└── mcp.json              # External tool servers
```text

### Plugin Manifest

```json
{
  "name": "devops-toolkit",
  "version": "1.0.0",
  "description": "DevOps automation tools",
  "author": "Your Name",
  "components": {
    "commands": ["commands/*.md"],
    "agents": ["agents/*.md"],
    "skills": ["skills/*/SKILL.md"],
    "hooks": "hooks.json",
    "mcp": "mcp.json"
  }
}
```text

### Plugin Loading

```python
# Programmatic
agent = Agent(
    plugins=[
        {"type": "local", "path": "./my-plugin"},
        {"type": "local", "path": "~/.agent/plugins/shared"}
    ]
)

# CLI
agent --plugin-dir ./my-plugin --plugin-dir ~/.agent/plugins/shared
```text

---

## External Tool Protocol (MCP)

### Overview

Model Context Protocol enables connecting external services as tools.

```text
    graph LR
    AC["Agent Core"] <--> MC["MCP Client"] <--> MS["MCP Server\n(External)"]
    MS --> DB["Database"]
    MS --> API["API"]
    MS --> BR["Browser"]


### Transport Types

````yaml
stdio:
  description: "Spawn process, communicate via stdin/stdout"
  config:
    command: "npx"
    args: ["@mcp/server-filesystem"]
    env:
      ALLOWED_PATHS: "/home/user/projects"

sse:
  description: "Server-Sent Events over HTTP"
  config:
    url: "https://api.example.com/mcp/sse"
    headers:
      Authorization: "Bearer ${API_TOKEN}"

http:
  description: "Standard HTTP requests"
  config:
    url: "https://api.example.com/mcp"
    headers:
      X-API-Key: "${API_KEY}"
```text

### MCP Configuration

```json
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["@mcp/server-filesystem"],
      "env": {
        "ALLOWED_PATHS": "/home/user/projects"
      }
    },
    "database": {
      "type": "sse",
      "url": "http://localhost:3001/mcp",
      "headers": {
        "Authorization": "Bearer ${DB_TOKEN}"
      }
    },
    "jira": {
      "command": "python",
      "args": ["-m", "mcp_jira"],
      "env": {
        "JIRA_URL": "${JIRA_URL}",
        "JIRA_TOKEN": "${JIRA_TOKEN}"
      }
    }
  }
}
```text

### Tool Naming Convention

MCP tools follow: `mcp__{server_name}__{tool_name}`

```python
allowed_tools = [
    "mcp__filesystem__read_file",
    "mcp__database__query",
    "mcp__jira__create_issue"
]
```text

---

## Context Management

### The Context Window Problem

```text
    graph TB
    subgraph CW["CONTEXT WINDOW - 128k-200k tokens"]
        SP["System Prompt\n~2-5k"]
        TD["Tool Definitions\n~3-8k"]
        PC["Project Context\n(CLAUDE.md, etc.)\n~1-5k"]
        CH["Conversation History\nUser messages, Assistant responses,\nTool calls and results"]
        CT["Current Turn"]
    end
    SP --> TD --> PC --> CH --> CT

Compaction Strategies

class ContextManager:
    def __init__(self, max_tokens: int = 100000):
        self.max_tokens = max_tokens
        self.compaction_threshold = 0.8  # 80%

    def should_compact(self, current_tokens: int) -> bool:
        return current_tokens > self.max_tokens * self.compaction_threshold

    def compact(self, messages: list) -> list:
        """Reduce context size while preserving critical information."""
        strategies = [
            self.truncate_tool_outputs,    # Limit long outputs
            self.summarize_old_turns,      # Summarize distant history
            self.remove_redundant_reads,   # Remove duplicate file reads
            self.compress_to_summary       # Last resort: full summarization
        ]

        for strategy in strategies:
            messages = strategy(messages)
            if self.count_tokens(messages) < self.max_tokens * 0.6:
                break

        return messages
```text

### Practical Techniques

1. **Truncate tool outputs**: Limit to 30k chars, show head + tail
2. **Deduplicate file reads**: Keep only latest version
3. **Summarize old turns**: Compress turns older than N
4. **Subagent isolation**: Subagents get fresh context, return summaries
5. **Prompt caching**: Cache static portions (system prompt, tools)

---

## CLI Interface Design

### Command Structure

```bash
# Basic usage
agent                           # Interactive REPL
agent "query"                   # Start with prompt
agent -p "query"                # Non-interactive (print mode)
agent -c                        # Continue last session
agent -r <session-id>           # Resume specific session

# Piping
cat file.log | agent -p "analyze errors"
git diff | agent -p "review changes"

# Configuration
agent --model sonnet            # Model selection
agent --permission-mode strict  # Permission mode
agent --max-turns 10            # Limit iterations
agent --timeout 300             # Global timeout (seconds)

# Extensions
agent --mcp-config ./mcp.json   # Load MCP servers
agent --plugin-dir ./plugins    # Load plugins
agent --agents '{"name": {...}}'  # Define subagents

# System prompt
agent --system-prompt "You are..."        # Replace
agent --append-system-prompt "Also..."    # Append
agent --system-prompt-file ./prompt.txt   # From file

# Output control
agent -p --output-format json   # JSON output
agent -p --output-format stream-json  # Streaming JSON
agent --verbose                 # Detailed logging
agent --debug "api,tools"       # Debug categories
```text

### Interactive Commands (Slash Commands)

```text
/help                    Show available commands
/clear                   Reset conversation
/compact                 Compress context manually
/model <name>            Switch model
/permissions             Manage tool permissions
/sessions                List saved sessions
/resume <id>             Resume session
/save                    Save current session
/config                  View/edit configuration
/bug                     Report issue
/quit                    Exit
```text

### Custom Slash Commands

```markdown
<!-- .commands/deploy.md -->
Deploy the current project to $ARGUMENTS environment.

1. Run tests: `npm test`
2. Build: `npm run build`
3. Deploy: `./scripts/deploy.sh $ARGUMENTS`
4. Verify deployment health
5. Report status

If any step fails, stop and report the error.
```text

Usage: `/project:deploy staging`

---

## Configuration Hierarchy

### Precedence (highest to lowest)

```text
1. CLI flags           --model opus
2. Environment vars    AGENT_MODEL=opus
3. Local config        ./.agent/settings.json
4. Project config      ./agent.config.json
5. User config         ~/.config/agent/settings.json
6. System defaults     Built-in values
```text

### Configuration File

```json
{
  "model": "sonnet",
  "fallback_model": "haiku",
  "permission_mode": "standard",
  "max_turns": 50,
  "timeout_ms": 300000,

  "tools": {
    "allowed": ["read", "glob", "grep", "bash(git:*)"],
    "disallowed": ["bash(rm -rf:*)"]
  },

  "context": {
    "max_tokens": 100000,
    "compaction_threshold": 0.8
  },

  "hooks": {
    "PreToolUse": [...]
  },

  "mcp_servers": {
    "filesystem": {...}
  },

  "output": {
    "format": "text",
    "verbose": false,
    "color": true
  }
}
```text

### Project Context File

```markdown
<!-- AGENT.md or CLAUDE.md -->

# Project: My Application

## Overview
This is a Next.js application with a Python backend.

## Architecture
- Frontend: Next.js 14, TypeScript, Tailwind
- Backend: FastAPI, PostgreSQL
- Infrastructure: Docker, Kubernetes

## Conventions
- Use TypeScript strict mode
- Follow PEP 8 for Python
- All functions require docstrings
- Tests required for new features

## Commands
- `npm run dev` - Start frontend
- `uvicorn main:app --reload` - Start backend
- `pytest` - Run tests
- `npm run lint && black .` - Format code

## File Structure
- `src/` - Frontend source
- `api/` - Backend source
- `tests/` - Test files
- `docs/` - Documentation
```text

---

## Session Management

### Session Persistence

```python
@dataclass
class Session:
    id: str
    created_at: datetime
    updated_at: datetime
    working_directory: str
    messages: list[Message]
    tool_states: dict  # Persistent tool state
    checkpoints: list[Checkpoint]
    metadata: dict

class SessionStore:
    def save(self, session: Session) -> None:
        """Persist session to disk."""
        path = self.sessions_dir / f"{session.id}.json"
        path.write_text(session.to_json())

    def load(self, session_id: str) -> Session:
        """Load session from disk."""
        path = self.sessions_dir / f"{session_id}.json"
        return Session.from_json(path.read_text())

    def list_recent(self, limit: int = 10) -> list[SessionSummary]:
        """List recent sessions."""
        sessions = sorted(
            self.sessions_dir.glob("*.json"),
            key=lambda p: p.stat().st_mtime,
            reverse=True
        )
        return [self._summarize(s) for s in sessions[:limit]]
```text

### Checkpointing

```python
class Checkpoint:
    """Snapshot of agent state at a point in time."""
    id: str
    session_id: str
    timestamp: datetime
    message_index: int
    file_snapshots: dict[str, str]  # path -> content hash
    description: str

def create_checkpoint(session: Session, description: str) -> Checkpoint:
    """Create restorable checkpoint."""
    return Checkpoint(
        id=generate_id(),
        session_id=session.id,
        timestamp=datetime.now(),
        message_index=len(session.messages),
        file_snapshots=snapshot_working_files(session.working_directory),
        description=description
    )

def restore_checkpoint(checkpoint: Checkpoint) -> Session:
    """Restore session to checkpoint state."""
    session = load_session(checkpoint.session_id)
    session.messages = session.messages[:checkpoint.message_index]
    restore_files(checkpoint.file_snapshots)
    return session
```text

---

## SDK Design

### Core API

```python
from agent_sdk import Agent, AgentOptions, Message

# Streaming execution
async for message in Agent.query(
    prompt="Fix the bug in auth.py",
    options=AgentOptions(
        model="sonnet",
        allowed_tools=["read", "edit", "bash(pytest:*)"],
        permission_mode="accept_edits",
        system_prompt="You are a Python expert.",
        working_directory="/path/to/project",
        mcp_servers={"db": {...}},
        hooks={"PreToolUse": [validate_hook]}
    )
):
    match message:
        case Message(type="assistant"):
            print(message.content)
        case Message(type="tool_use"):
            print(f"Using {message.tool_name}")
        case Message(type="tool_result"):
            print(f"Result: {message.result[:100]}")
        case Message(type="result", subtype="success"):
            print("Task completed!")

# Single-turn execution
result = await Agent.run(
    prompt="List all Python files",
    options=AgentOptions(allowed_tools=["glob"])
)
print(result.output)
```text

### Message Types

```python
@dataclass
class Message:
    type: Literal[
        "system",        # System events (init, error)
        "user",          # User input
        "assistant",     # Agent response
        "tool_use",      # Tool invocation
        "tool_result",   # Tool output
        "result"         # Final result
    ]
    subtype: str | None  # e.g., "success", "error", "init"
    content: Any
    metadata: dict
```text

### Custom Tool Definition

```python
from agent_sdk import tool, ToolResult

@tool(
    name="query_database",
    description="Execute SQL query against the database",
    parameters={
        "query": {"type": "string", "description": "SQL query to execute"},
        "database": {"type": "string", "description": "Database name"}
    }
)
async def query_database(query: str, database: str) -> ToolResult:
    try:
        result = await db.execute(query, database)
        return ToolResult.success(result.to_json())
    except DatabaseError as e:
        return ToolResult.error(f"Query failed: {e}")
```text

---

## Error Handling

### Error Categories

```python
class AgentError(Exception):
    """Base agent error."""
    pass

class ToolExecutionError(AgentError):
    """Tool failed to execute."""
    tool_name: str
    input: dict
    cause: Exception

class PermissionDeniedError(AgentError):
    """User denied permission."""
    tool_name: str
    input: dict

class ContextOverflowError(AgentError):
    """Context window exceeded."""
    current_tokens: int
    max_tokens: int

class ModelError(AgentError):
    """LLM provider error."""
    provider: str
    status_code: int
    message: str

class TimeoutError(AgentError):
    """Operation timed out."""
    operation: str
    timeout_ms: int
```text

### Retry Strategies

```python
class RetryPolicy:
    max_retries: int = 3
    backoff_base: float = 1.0
    backoff_max: float = 30.0
    retryable_errors: set = {
        "rate_limit",
        "overloaded",
        "timeout",
        "connection_error"
    }

    def should_retry(self, error: AgentError, attempt: int) -> bool:
        if attempt >= self.max_retries:
            return False
        return error.code in self.retryable_errors

    def get_delay(self, attempt: int) -> float:
        delay = self.backoff_base * (2 ** attempt)
        return min(delay, self.backoff_max)
```text

---

## Observability

### Logging

```python
import structlog

logger = structlog.get_logger()

# Tool execution
logger.info("tool_execution",
    tool=tool_name,
    input=tool_input,
    duration_ms=duration,
    success=True
)

# LLM call
logger.info("llm_call",
    model=model_name,
    input_tokens=input_tokens,
    output_tokens=output_tokens,
    duration_ms=duration,
    cached_tokens=cached_tokens
)

# Session events
logger.info("session_event",
    event="created" | "resumed" | "completed" | "error",
    session_id=session_id,
    duration_total_ms=total_duration
)
```text

### Metrics

```python
from prometheus_client import Counter, Histogram

tool_calls = Counter(
    "agent_tool_calls_total",
    "Total tool calls",
    ["tool_name", "status"]
)

tool_duration = Histogram(
    "agent_tool_duration_seconds",
    "Tool execution duration",
    ["tool_name"]
)

llm_tokens = Counter(
    "agent_llm_tokens_total",
    "Total LLM tokens",
    ["model", "type"]  # type: input, output, cached
)

session_duration = Histogram(
    "agent_session_duration_seconds",
    "Session duration"
)
```text

### Cost Tracking

```python
@dataclass
class UsageTracker:
    input_tokens: int = 0
    output_tokens: int = 0
    cached_tokens: int = 0
    tool_calls: int = 0

    def add_llm_call(self, response: LLMResponse):
        self.input_tokens += response.usage.input_tokens
        self.output_tokens += response.usage.output_tokens
        self.cached_tokens += response.usage.cached_tokens

    def estimate_cost(self, pricing: dict) -> float:
        input_cost = self.input_tokens * pricing["input_per_token"]
        output_cost = self.output_tokens * pricing["output_per_token"]
        cached_cost = self.cached_tokens * pricing["cached_per_token"]
        return input_cost + output_cost + cached_cost
```text

---

## Security Considerations

### Sandboxing

```yaml
# Docker-based isolation
docker_config:
  image: "agent-sandbox:latest"
  network: none  # or restricted
  read_only_root: true
  volumes:
    - source: /project
      target: /workspace
      read_only: false
    - source: /home/user/.agent
      target: /config
      read_only: true
  resource_limits:
    memory: 4g
    cpu: 2
    pids: 100
  security_opts:
    - no-new-privileges
    - seccomp=restricted.json
```text

### Input Validation

```python
def validate_tool_input(tool_name: str, input: dict) -> ValidationResult:
    """Validate tool input before execution."""
    schema = get_tool_schema(tool_name)

    # JSON Schema validation
    errors = jsonschema.validate(input, schema)
    if errors:
        return ValidationResult.invalid(errors)

    # Path traversal check
    if "path" in input or "file_path" in input:
        path = input.get("path") or input.get("file_path")
        if not is_within_workspace(path):
            return ValidationResult.invalid("Path traversal detected")

    # Command injection check
    if tool_name == "bash":
        if contains_injection(input["command"]):
            return ValidationResult.invalid("Potential command injection")

    return ValidationResult.valid()
```text

### Secret Management

```python
# Environment variable expansion (safe)
def expand_env_vars(config: dict) -> dict:
    """Expand ${VAR} patterns in config."""
    def expand(value: str) -> str:
        pattern = r'\$\{([^}]+)\}'
        def replace(match):
            var_name = match.group(1)
            default = None
            if ":-" in var_name:
                var_name, default = var_name.split(":-", 1)
            return os.environ.get(var_name, default or "")
        return re.sub(pattern, replace, value)

    return deep_map(config, expand)
```text

---

## Performance Optimization

### Prompt Caching

```python
class PromptCache:
    """Cache static prompt components."""

    def __init__(self):
        self.cache = {}

    def get_cached_prefix(self, components: list[str]) -> CachedPrefix:
        """Get or create cached prefix from components."""
        key = hash(tuple(components))
        if key not in self.cache:
            self.cache[key] = self._create_prefix(components)
        return self.cache[key]

    def build_messages(
        self,
        system_prompt: str,
        tool_definitions: list,
        project_context: str,
        conversation: list[Message]
    ) -> list[Message]:
        """Build messages with cached prefix."""
        # Static components get cached
        prefix = self.get_cached_prefix([
            system_prompt,
            json.dumps(tool_definitions),
            project_context
        ])

        # Dynamic components appended
        return prefix + conversation
```text

### Parallel Tool Execution

```python
async def execute_tools_batch(tool_calls: list[ToolCall]) -> list[ToolResult]:
    """Execute independent tool calls in parallel."""

    # Group by dependency
    independent = [t for t in tool_calls if not t.depends_on]
    dependent = [t for t in tool_calls if t.depends_on]

    # Execute independent calls in parallel
    results = await asyncio.gather(*[
        execute_tool(t) for t in independent
    ])

    # Execute dependent calls sequentially
    for tool_call in dependent:
        result = await execute_tool(tool_call)
        results.append(result)

    return results
```text

### Streaming Output

```python
async def stream_response(query: str, options: AgentOptions):
    """Stream agent responses for real-time display."""

    buffer = ""
    async for message in Agent.query(query, options):
        if message.type == "assistant":
            # Stream text token by token
            for token in message.content:
                yield token
                buffer += token

        elif message.type == "tool_use":
            yield f"\n[Using {message.tool_name}...]\n"

        elif message.type == "tool_result":
            if options.verbose:
                yield f"[Result: {message.result[:200]}...]\n"
```text

---

## Testing Strategies

### Unit Testing Tools

```python
import pytest
from agent_sdk.testing import MockLLM, MockToolExecutor

@pytest.fixture
def mock_agent():
    return Agent(
        llm=MockLLM(responses=[
            {"tool_use": {"name": "read", "input": {"path": "test.py"}}},
            {"text": "The file contains a function definition."}
        ]),
        tool_executor=MockToolExecutor({
            "read": lambda input: "def hello(): pass"
        })
    )

async def test_file_analysis(mock_agent):
    result = await mock_agent.run("Analyze test.py")
    assert "function" in result.output
    assert mock_agent.tool_calls == [("read", {"path": "test.py"})]
```text

### Integration Testing

```python
@pytest.mark.integration
async def test_edit_workflow():
    """Test complete edit workflow in isolated environment."""

    with TempWorkspace() as workspace:
        # Setup
        workspace.write("src/main.py", "def old_name(): pass")

        # Execute
        agent = Agent(working_directory=workspace.path)
        await agent.run("Rename old_name to new_name in src/main.py")

        # Verify
        content = workspace.read("src/main.py")
        assert "def new_name():" in content
        assert "old_name" not in content
```text

### Snapshot Testing

```python
def test_tool_schema_stability():
    """Ensure tool schemas don't change unexpectedly."""
    tools = get_all_tool_definitions()

    for tool in tools:
        snapshot_path = f"snapshots/tools/{tool.name}.json"
        current = tool.to_json()

        if os.path.exists(snapshot_path):
            expected = json.load(open(snapshot_path))
            assert current == expected, f"Tool {tool.name} schema changed"
        else:
            json.dump(current, open(snapshot_path, "w"))
```text

---

## Deployment Patterns

### Docker Container

```dockerfile
FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    ripgrep \
    && rm -rf /var/lib/apt/lists/*

# Install agent
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Create non-root user
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /home/agent

# Copy configuration
COPY --chown=agent:agent config/ /home/agent/.config/agent/

ENTRYPOINT ["agent"]
CMD ["--help"]
```text

### CI/CD Integration

```yaml
# GitHub Actions example
name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get changed files
        id: changes
        run: |
          echo "files=$(git diff --name-only origin/main...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT

      - name: Run AI Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          agent -p "Review these changed files for bugs and improvements: ${{ steps.changes.outputs.files }}" \
            --output-format json \
            --max-turns 20 \
            > review.json

      - name: Post Review Comment
        uses: actions/github-script@v7
        with:
          script: |
            const review = require('./review.json');
            github.rest.pulls.createReview({
              ...context.repo,
              pull_number: context.issue.number,
              body: review.output,
              event: 'COMMENT'
            });
```text

### Kubernetes Deployment

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agent
  template:
    metadata:
      labels:
        app: agent
    spec:
      containers:
        - name: agent
          image: my-agent:latest
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "2000m"
          env:
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: api-keys
                  key: anthropic
          volumeMounts:
            - name: workspace
              mountPath: /workspace
      volumes:
        - name: workspace
          emptyDir: {}
```text

---

## Appendix: Decision Checklist

### When to Use Dedicated Tools vs Bash

| Use Dedicated Tool | Use Bash |
|-------------------|----------|
| File reading (read) | Git operations |
| File editing (edit) | Package management (npm, pip) |
| Pattern search (grep) | Running tests |
| File finding (glob) | Build commands |
| Directory listing (ls) | Custom scripts |

### When to Spawn Subagents

| Spawn Subagent | Handle in Main Loop |
|----------------|---------------------|
| Complex research tasks | Simple file operations |
| Exploration of large codebases | Direct answers |
| Parallel independent tasks | Sequential dependent tasks |
| Isolated experimental changes | Standard workflows |

### Permission Mode Selection

| Mode | Use Case |
|------|----------|
| Strict | Untrusted environments, production systems |
| Standard | Normal development work |
| Permissive | Trusted projects, experienced users |
| Autonomous | Automated pipelines, sandboxed environments |

---

*This document provides architectural patterns for building agentic CLI tools. Implementations will vary based on specific requirements, target LLM providers, and use cases.*

A comprehensive guide to building production-grade agentic command-line interfaces, derived from patterns in state-of-the-art implementations.

Core Architecture

Layered Design

Loading diagram...

Execution Model: Single-Thread Simplicity

Prefer a single main agent loop over complex multi-agent orchestration.

Loading diagram...

Key Principles:

Flat message list as single source of truth
Subagents spawn with isolated context, return summarized results
Maximum one level of delegation (no recursive subagent spawning)
Results from subagents become tool responses in main thread

Tool System Design

Tool Hierarchy

Design tools at multiple abstraction levels:

Level	Characteristics	Examples
Low-level	Direct system access, flexible but error-prone	`bash`, `read_file`, `write_file`
Mid-level	Specialized, optimized for common operations	`grep`, `glob`, `edit`, `multi_edit`
High-level	Orchestration, deterministic outcomes	`spawn_agent`, `web_fetch`, `todo_list`

Why multiple levels?

Frequent operations deserve dedicated tools (reduces LLM errors)
Specialized tools have better prompts and validation
High-level tools save tokens and keep agent on track

Tool Categories

1. Filesystem Tools

Loading diagram...

Skill Definition

<!-- .skills/code-review/SKILL.md -->

---

name: code-review
description: "Thorough code review with best practices"
tools: Read,Grep,Glob

---

# Code Review Skill

When performing code reviews, follow this methodology:

## 1. Understanding Phase

- Read the changed files completely
- Identify the purpose of changes
- Check for related test files

## 2. Analysis Checklist

- [ ] Error handling present
- [ ] Edge cases considered
- [ ] No hardcoded secrets
- [ ] Logging appropriate
- [ ] Tests cover new code

## 3. Output Format

Provide feedback as:

- 🔴 Critical: Must fix before merge
- 🟡 Suggestion: Consider improving
- 🟢 Praise: Well done

````text

### Skill Discovery

```text
project/
├── .skills/
│   ├── code-review/
│   │   └── SKILL.md
│   └── api-design/
│       └── SKILL.md
└── ~/.config/agent/skills/  # Global skills
    └── security-audit/
        └── SKILL.md
```text

---

## Plugin Architecture

### Plugin Structure

```text
my-plugin/
├── plugin.json           # Manifest (required)
├── commands/             # Slash commands
│   └── deploy.md
├── agents/               # Custom agents
│   └── devops.md
├── skills/               # Skills
│   └── kubernetes/
│       └── SKILL.md
├── hooks.json            # Hook definitions
└── mcp.json              # External tool servers
```text

### Plugin Manifest

```json
{
  "name": "devops-toolkit",
  "version": "1.0.0",
  "description": "DevOps automation tools",
  "author": "Your Name",
  "components": {
    "commands": ["commands/*.md"],
    "agents": ["agents/*.md"],
    "skills": ["skills/*/SKILL.md"],
    "hooks": "hooks.json",
    "mcp": "mcp.json"
  }
}
```text

### Plugin Loading

```python
# Programmatic
agent = Agent(
    plugins=[
        {"type": "local", "path": "./my-plugin"},
        {"type": "local", "path": "~/.agent/plugins/shared"}
    ]
)

# CLI
agent --plugin-dir ./my-plugin --plugin-dir ~/.agent/plugins/shared
```text

---

## External Tool Protocol (MCP)

### Overview

Model Context Protocol enables connecting external services as tools.

```text
    graph LR
    AC["Agent Core"] <--> MC["MCP Client"] <--> MS["MCP Server\n(External)"]
    MS --> DB["Database"]
    MS --> API["API"]
    MS --> BR["Browser"]


### Transport Types

````yaml
stdio:
  description: "Spawn process, communicate via stdin/stdout"
  config:
    command: "npx"
    args: ["@mcp/server-filesystem"]
    env:
      ALLOWED_PATHS: "/home/user/projects"

sse:
  description: "Server-Sent Events over HTTP"
  config:
    url: "https://api.example.com/mcp/sse"
    headers:
      Authorization: "Bearer ${API_TOKEN}"

http:
  description: "Standard HTTP requests"
  config:
    url: "https://api.example.com/mcp"
    headers:
      X-API-Key: "${API_KEY}"
```text

### MCP Configuration

```json
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["@mcp/server-filesystem"],
      "env": {
        "ALLOWED_PATHS": "/home/user/projects"
      }
    },
    "database": {
      "type": "sse",
      "url": "http://localhost:3001/mcp",
      "headers": {
        "Authorization": "Bearer ${DB_TOKEN}"
      }
    },
    "jira": {
      "command": "python",
      "args": ["-m", "mcp_jira"],
      "env": {
        "JIRA_URL": "${JIRA_URL}",
        "JIRA_TOKEN": "${JIRA_TOKEN}"
      }
    }
  }
}
```text

### Tool Naming Convention

MCP tools follow: `mcp__{server_name}__{tool_name}`

```python
allowed_tools = [
    "mcp__filesystem__read_file",
    "mcp__database__query",
    "mcp__jira__create_issue"
]
```text

---

## Context Management

### The Context Window Problem

```text
    graph TB
    subgraph CW["CONTEXT WINDOW - 128k-200k tokens"]
        SP["System Prompt\n~2-5k"]
        TD["Tool Definitions\n~3-8k"]
        PC["Project Context\n(CLAUDE.md, etc.)\n~1-5k"]
        CH["Conversation History\nUser messages, Assistant responses,\nTool calls and results"]
        CT["Current Turn"]
    end
    SP --> TD --> PC --> CH --> CT

Compaction Strategies

class ContextManager:
    def __init__(self, max_tokens: int = 100000):
        self.max_tokens = max_tokens
        self.compaction_threshold = 0.8  # 80%

    def should_compact(self, current_tokens: int) -> bool:
        return current_tokens > self.max_tokens * self.compaction_threshold

    def compact(self, messages: list) -> list:
        """Reduce context size while preserving critical information."""
        strategies = [
            self.truncate_tool_outputs,    # Limit long outputs
            self.summarize_old_turns,      # Summarize distant history
            self.remove_redundant_reads,   # Remove duplicate file reads
            self.compress_to_summary       # Last resort: full summarization
        ]

        for strategy in strategies:
            messages = strategy(messages)
            if self.count_tokens(messages) < self.max_tokens * 0.6:
                break

        return messages
```text

### Practical Techniques

1. **Truncate tool outputs**: Limit to 30k chars, show head + tail
2. **Deduplicate file reads**: Keep only latest version
3. **Summarize old turns**: Compress turns older than N
4. **Subagent isolation**: Subagents get fresh context, return summaries
5. **Prompt caching**: Cache static portions (system prompt, tools)

---

## CLI Interface Design

### Command Structure

```bash
# Basic usage
agent                           # Interactive REPL
agent "query"                   # Start with prompt
agent -p "query"                # Non-interactive (print mode)
agent -c                        # Continue last session
agent -r <session-id>           # Resume specific session

# Piping
cat file.log | agent -p "analyze errors"
git diff | agent -p "review changes"

# Configuration
agent --model sonnet            # Model selection
agent --permission-mode strict  # Permission mode
agent --max-turns 10            # Limit iterations
agent --timeout 300             # Global timeout (seconds)

# Extensions
agent --mcp-config ./mcp.json   # Load MCP servers
agent --plugin-dir ./plugins    # Load plugins
agent --agents '{"name": {...}}'  # Define subagents

# System prompt
agent --system-prompt "You are..."        # Replace
agent --append-system-prompt "Also..."    # Append
agent --system-prompt-file ./prompt.txt   # From file

# Output control
agent -p --output-format json   # JSON output
agent -p --output-format stream-json  # Streaming JSON
agent --verbose                 # Detailed logging
agent --debug "api,tools"       # Debug categories
```text

### Interactive Commands (Slash Commands)

```text
/help                    Show available commands
/clear                   Reset conversation
/compact                 Compress context manually
/model <name>            Switch model
/permissions             Manage tool permissions
/sessions                List saved sessions
/resume <id>             Resume session
/save                    Save current session
/config                  View/edit configuration
/bug                     Report issue
/quit                    Exit
```text

### Custom Slash Commands

```markdown
<!-- .commands/deploy.md -->
Deploy the current project to $ARGUMENTS environment.

1. Run tests: `npm test`
2. Build: `npm run build`
3. Deploy: `./scripts/deploy.sh $ARGUMENTS`
4. Verify deployment health
5. Report status

If any step fails, stop and report the error.
```text

Usage: `/project:deploy staging`

---

## Configuration Hierarchy

### Precedence (highest to lowest)

```text
1. CLI flags           --model opus
2. Environment vars    AGENT_MODEL=opus
3. Local config        ./.agent/settings.json
4. Project config      ./agent.config.json
5. User config         ~/.config/agent/settings.json
6. System defaults     Built-in values
```text

### Configuration File

```json
{
  "model": "sonnet",
  "fallback_model": "haiku",
  "permission_mode": "standard",
  "max_turns": 50,
  "timeout_ms": 300000,

  "tools": {
    "allowed": ["read", "glob", "grep", "bash(git:*)"],
    "disallowed": ["bash(rm -rf:*)"]
  },

  "context": {
    "max_tokens": 100000,
    "compaction_threshold": 0.8
  },

  "hooks": {
    "PreToolUse": [...]
  },

  "mcp_servers": {
    "filesystem": {...}
  },

  "output": {
    "format": "text",
    "verbose": false,
    "color": true
  }
}
```text

### Project Context File

```markdown
<!-- AGENT.md or CLAUDE.md -->

# Project: My Application

## Overview
This is a Next.js application with a Python backend.

## Architecture
- Frontend: Next.js 14, TypeScript, Tailwind
- Backend: FastAPI, PostgreSQL
- Infrastructure: Docker, Kubernetes

## Conventions
- Use TypeScript strict mode
- Follow PEP 8 for Python
- All functions require docstrings
- Tests required for new features

## Commands
- `npm run dev` - Start frontend
- `uvicorn main:app --reload` - Start backend
- `pytest` - Run tests
- `npm run lint && black .` - Format code

## File Structure
- `src/` - Frontend source
- `api/` - Backend source
- `tests/` - Test files
- `docs/` - Documentation
```text

---

## Session Management

### Session Persistence

```python
@dataclass
class Session:
    id: str
    created_at: datetime
    updated_at: datetime
    working_directory: str
    messages: list[Message]
    tool_states: dict  # Persistent tool state
    checkpoints: list[Checkpoint]
    metadata: dict

class SessionStore:
    def save(self, session: Session) -> None:
        """Persist session to disk."""
        path = self.sessions_dir / f"{session.id}.json"
        path.write_text(session.to_json())

    def load(self, session_id: str) -> Session:
        """Load session from disk."""
        path = self.sessions_dir / f"{session_id}.json"
        return Session.from_json(path.read_text())

    def list_recent(self, limit: int = 10) -> list[SessionSummary]:
        """List recent sessions."""
        sessions = sorted(
            self.sessions_dir.glob("*.json"),
            key=lambda p: p.stat().st_mtime,
            reverse=True
        )
        return [self._summarize(s) for s in sessions[:limit]]
```text

### Checkpointing

```python
class Checkpoint:
    """Snapshot of agent state at a point in time."""
    id: str
    session_id: str
    timestamp: datetime
    message_index: int
    file_snapshots: dict[str, str]  # path -> content hash
    description: str

def create_checkpoint(session: Session, description: str) -> Checkpoint:
    """Create restorable checkpoint."""
    return Checkpoint(
        id=generate_id(),
        session_id=session.id,
        timestamp=datetime.now(),
        message_index=len(session.messages),
        file_snapshots=snapshot_working_files(session.working_directory),
        description=description
    )

def restore_checkpoint(checkpoint: Checkpoint) -> Session:
    """Restore session to checkpoint state."""
    session = load_session(checkpoint.session_id)
    session.messages = session.messages[:checkpoint.message_index]
    restore_files(checkpoint.file_snapshots)
    return session
```text

---

## SDK Design

### Core API

```python
from agent_sdk import Agent, AgentOptions, Message

# Streaming execution
async for message in Agent.query(
    prompt="Fix the bug in auth.py",
    options=AgentOptions(
        model="sonnet",
        allowed_tools=["read", "edit", "bash(pytest:*)"],
        permission_mode="accept_edits",
        system_prompt="You are a Python expert.",
        working_directory="/path/to/project",
        mcp_servers={"db": {...}},
        hooks={"PreToolUse": [validate_hook]}
    )
):
    match message:
        case Message(type="assistant"):
            print(message.content)
        case Message(type="tool_use"):
            print(f"Using {message.tool_name}")
        case Message(type="tool_result"):
            print(f"Result: {message.result[:100]}")
        case Message(type="result", subtype="success"):
            print("Task completed!")

# Single-turn execution
result = await Agent.run(
    prompt="List all Python files",
    options=AgentOptions(allowed_tools=["glob"])
)
print(result.output)
```text

### Message Types

```python
@dataclass
class Message:
    type: Literal[
        "system",        # System events (init, error)
        "user",          # User input
        "assistant",     # Agent response
        "tool_use",      # Tool invocation
        "tool_result",   # Tool output
        "result"         # Final result
    ]
    subtype: str | None  # e.g., "success", "error", "init"
    content: Any
    metadata: dict
```text

### Custom Tool Definition

```python
from agent_sdk import tool, ToolResult

@tool(
    name="query_database",
    description="Execute SQL query against the database",
    parameters={
        "query": {"type": "string", "description": "SQL query to execute"},
        "database": {"type": "string", "description": "Database name"}
    }
)
async def query_database(query: str, database: str) -> ToolResult:
    try:
        result = await db.execute(query, database)
        return ToolResult.success(result.to_json())
    except DatabaseError as e:
        return ToolResult.error(f"Query failed: {e}")
```text

---

## Error Handling

### Error Categories

```python
class AgentError(Exception):
    """Base agent error."""
    pass

class ToolExecutionError(AgentError):
    """Tool failed to execute."""
    tool_name: str
    input: dict
    cause: Exception

class PermissionDeniedError(AgentError):
    """User denied permission."""
    tool_name: str
    input: dict

class ContextOverflowError(AgentError):
    """Context window exceeded."""
    current_tokens: int
    max_tokens: int

class ModelError(AgentError):
    """LLM provider error."""
    provider: str
    status_code: int
    message: str

class TimeoutError(AgentError):
    """Operation timed out."""
    operation: str
    timeout_ms: int
```text

### Retry Strategies

```python
class RetryPolicy:
    max_retries: int = 3
    backoff_base: float = 1.0
    backoff_max: float = 30.0
    retryable_errors: set = {
        "rate_limit",
        "overloaded",
        "timeout",
        "connection_error"
    }

    def should_retry(self, error: AgentError, attempt: int) -> bool:
        if attempt >= self.max_retries:
            return False
        return error.code in self.retryable_errors

    def get_delay(self, attempt: int) -> float:
        delay = self.backoff_base * (2 ** attempt)
        return min(delay, self.backoff_max)
```text

---

## Observability

### Logging

```python
import structlog

logger = structlog.get_logger()

# Tool execution
logger.info("tool_execution",
    tool=tool_name,
    input=tool_input,
    duration_ms=duration,
    success=True
)

# LLM call
logger.info("llm_call",
    model=model_name,
    input_tokens=input_tokens,
    output_tokens=output_tokens,
    duration_ms=duration,
    cached_tokens=cached_tokens
)

# Session events
logger.info("session_event",
    event="created" | "resumed" | "completed" | "error",
    session_id=session_id,
    duration_total_ms=total_duration
)
```text

### Metrics

```python
from prometheus_client import Counter, Histogram

tool_calls = Counter(
    "agent_tool_calls_total",
    "Total tool calls",
    ["tool_name", "status"]
)

tool_duration = Histogram(
    "agent_tool_duration_seconds",
    "Tool execution duration",
    ["tool_name"]
)

llm_tokens = Counter(
    "agent_llm_tokens_total",
    "Total LLM tokens",
    ["model", "type"]  # type: input, output, cached
)

session_duration = Histogram(
    "agent_session_duration_seconds",
    "Session duration"
)
```text

### Cost Tracking

```python
@dataclass
class UsageTracker:
    input_tokens: int = 0
    output_tokens: int = 0
    cached_tokens: int = 0
    tool_calls: int = 0

    def add_llm_call(self, response: LLMResponse):
        self.input_tokens += response.usage.input_tokens
        self.output_tokens += response.usage.output_tokens
        self.cached_tokens += response.usage.cached_tokens

    def estimate_cost(self, pricing: dict) -> float:
        input_cost = self.input_tokens * pricing["input_per_token"]
        output_cost = self.output_tokens * pricing["output_per_token"]
        cached_cost = self.cached_tokens * pricing["cached_per_token"]
        return input_cost + output_cost + cached_cost
```text

---

## Security Considerations

### Sandboxing

```yaml
# Docker-based isolation
docker_config:
  image: "agent-sandbox:latest"
  network: none  # or restricted
  read_only_root: true
  volumes:
    - source: /project
      target: /workspace
      read_only: false
    - source: /home/user/.agent
      target: /config
      read_only: true
  resource_limits:
    memory: 4g
    cpu: 2
    pids: 100
  security_opts:
    - no-new-privileges
    - seccomp=restricted.json
```text

### Input Validation

```python
def validate_tool_input(tool_name: str, input: dict) -> ValidationResult:
    """Validate tool input before execution."""
    schema = get_tool_schema(tool_name)

    # JSON Schema validation
    errors = jsonschema.validate(input, schema)
    if errors:
        return ValidationResult.invalid(errors)

    # Path traversal check
    if "path" in input or "file_path" in input:
        path = input.get("path") or input.get("file_path")
        if not is_within_workspace(path):
            return ValidationResult.invalid("Path traversal detected")

    # Command injection check
    if tool_name == "bash":
        if contains_injection(input["command"]):
            return ValidationResult.invalid("Potential command injection")

    return ValidationResult.valid()
```text

### Secret Management

```python
# Environment variable expansion (safe)
def expand_env_vars(config: dict) -> dict:
    """Expand ${VAR} patterns in config."""
    def expand(value: str) -> str:
        pattern = r'\$\{([^}]+)\}'
        def replace(match):
            var_name = match.group(1)
            default = None
            if ":-" in var_name:
                var_name, default = var_name.split(":-", 1)
            return os.environ.get(var_name, default or "")
        return re.sub(pattern, replace, value)

    return deep_map(config, expand)
```text

---

## Performance Optimization

### Prompt Caching

```python
class PromptCache:
    """Cache static prompt components."""

    def __init__(self):
        self.cache = {}

    def get_cached_prefix(self, components: list[str]) -> CachedPrefix:
        """Get or create cached prefix from components."""
        key = hash(tuple(components))
        if key not in self.cache:
            self.cache[key] = self._create_prefix(components)
        return self.cache[key]

    def build_messages(
        self,
        system_prompt: str,
        tool_definitions: list,
        project_context: str,
        conversation: list[Message]
    ) -> list[Message]:
        """Build messages with cached prefix."""
        # Static components get cached
        prefix = self.get_cached_prefix([
            system_prompt,
            json.dumps(tool_definitions),
            project_context
        ])

        # Dynamic components appended
        return prefix + conversation
```text

### Parallel Tool Execution

```python
async def execute_tools_batch(tool_calls: list[ToolCall]) -> list[ToolResult]:
    """Execute independent tool calls in parallel."""

    # Group by dependency
    independent = [t for t in tool_calls if not t.depends_on]
    dependent = [t for t in tool_calls if t.depends_on]

    # Execute independent calls in parallel
    results = await asyncio.gather(*[
        execute_tool(t) for t in independent
    ])

    # Execute dependent calls sequentially
    for tool_call in dependent:
        result = await execute_tool(tool_call)
        results.append(result)

    return results
```text

### Streaming Output

```python
async def stream_response(query: str, options: AgentOptions):
    """Stream agent responses for real-time display."""

    buffer = ""
    async for message in Agent.query(query, options):
        if message.type == "assistant":
            # Stream text token by token
            for token in message.content:
                yield token
                buffer += token

        elif message.type == "tool_use":
            yield f"\n[Using {message.tool_name}...]\n"

        elif message.type == "tool_result":
            if options.verbose:
                yield f"[Result: {message.result[:200]}...]\n"
```text

---

## Testing Strategies

### Unit Testing Tools

```python
import pytest
from agent_sdk.testing import MockLLM, MockToolExecutor

@pytest.fixture
def mock_agent():
    return Agent(
        llm=MockLLM(responses=[
            {"tool_use": {"name": "read", "input": {"path": "test.py"}}},
            {"text": "The file contains a function definition."}
        ]),
        tool_executor=MockToolExecutor({
            "read": lambda input: "def hello(): pass"
        })
    )

async def test_file_analysis(mock_agent):
    result = await mock_agent.run("Analyze test.py")
    assert "function" in result.output
    assert mock_agent.tool_calls == [("read", {"path": "test.py"})]
```text

### Integration Testing

```python
@pytest.mark.integration
async def test_edit_workflow():
    """Test complete edit workflow in isolated environment."""

    with TempWorkspace() as workspace:
        # Setup
        workspace.write("src/main.py", "def old_name(): pass")

        # Execute
        agent = Agent(working_directory=workspace.path)
        await agent.run("Rename old_name to new_name in src/main.py")

        # Verify
        content = workspace.read("src/main.py")
        assert "def new_name():" in content
        assert "old_name" not in content
```text

### Snapshot Testing

```python
def test_tool_schema_stability():
    """Ensure tool schemas don't change unexpectedly."""
    tools = get_all_tool_definitions()

    for tool in tools:
        snapshot_path = f"snapshots/tools/{tool.name}.json"
        current = tool.to_json()

        if os.path.exists(snapshot_path):
            expected = json.load(open(snapshot_path))
            assert current == expected, f"Tool {tool.name} schema changed"
        else:
            json.dump(current, open(snapshot_path, "w"))
```text

---

## Deployment Patterns

### Docker Container

```dockerfile
FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    ripgrep \
    && rm -rf /var/lib/apt/lists/*

# Install agent
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Create non-root user
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /home/agent

# Copy configuration
COPY --chown=agent:agent config/ /home/agent/.config/agent/

ENTRYPOINT ["agent"]
CMD ["--help"]
```text

### CI/CD Integration

```yaml
# GitHub Actions example
name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get changed files
        id: changes
        run: |
          echo "files=$(git diff --name-only origin/main...HEAD | tr '\n' ' ')" >> $GITHUB_OUTPUT

      - name: Run AI Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          agent -p "Review these changed files for bugs and improvements: ${{ steps.changes.outputs.files }}" \
            --output-format json \
            --max-turns 20 \
            > review.json

      - name: Post Review Comment
        uses: actions/github-script@v7
        with:
          script: |
            const review = require('./review.json');
            github.rest.pulls.createReview({
              ...context.repo,
              pull_number: context.issue.number,
              body: review.output,
              event: 'COMMENT'
            });
```text

### Kubernetes Deployment

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agent
  template:
    metadata:
      labels:
        app: agent
    spec:
      containers:
        - name: agent
          image: my-agent:latest
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "2000m"
          env:
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: api-keys
                  key: anthropic
          volumeMounts:
            - name: workspace
              mountPath: /workspace
      volumes:
        - name: workspace
          emptyDir: {}
```text

---

## Appendix: Decision Checklist

### When to Use Dedicated Tools vs Bash

| Use Dedicated Tool | Use Bash |
|-------------------|----------|
| File reading (read) | Git operations |
| File editing (edit) | Package management (npm, pip) |
| Pattern search (grep) | Running tests |
| File finding (glob) | Build commands |
| Directory listing (ls) | Custom scripts |

### When to Spawn Subagents

| Spawn Subagent | Handle in Main Loop |
|----------------|---------------------|
| Complex research tasks | Simple file operations |
| Exploration of large codebases | Direct answers |
| Parallel independent tasks | Sequential dependent tasks |
| Isolated experimental changes | Standard workflows |

### Permission Mode Selection

| Mode | Use Case |
|------|----------|
| Strict | Untrusted environments, production systems |
| Standard | Normal development work |
| Permissive | Trusted projects, experienced users |
| Autonomous | Automated pipelines, sandboxed environments |

---

*This document provides architectural patterns for building agentic CLI tools. Implementations will vary based on specific requirements, target LLM providers, and use cases.*