tai/docs/ARCHITECTURE.md

# Architecture

This document describes tai's current runtime architecture, module responsibilities, and data flow.

## High-Level Flow

1. User runs `tai run` with issue text and target host settings.
1. CLI validates input and opens a shared SSH session.
1. Probe and collection run against a read-only command plan.
1. Collection output is converted into diagnostic chunks.
1. Optional RAG retrieval selects top-k chunks per question.
1. Optional runbook retrieval selects top-k runbook chunks from ChromaDB.
1. Prompt builder composes system + user message.
1. AI completion returns analysis.
1. Guardrails validate response quality signals.
1. Optional session logger writes JSONL events.

## Module Layout

- `src/tai/cli.py`
  - Command definitions (`run`, `runbooks sync/list/add`)
  - Orchestration across SSH, collection, RAG, prompts, AI, and logging
- `src/tai/input_parser.py`
  - User input validation and request normalization
- `src/tai/models.py`
  - Core dataclasses (`TroubleshootRequest`)
- `src/tai/ssh_client.py`
  - SSH invocation
  - Read-only command policy validation
  - Probe and command execution helpers
- `src/tai/plan.py`
  - Issue keyword/service extraction
  - Command plan generation
  - Service/subsystem presence probes (unit files, binaries)
- `src/tai/collectors.py`
  - Executes command plans and builds `CollectionReport`
- `src/tai/rag_retriever.py`
  - Command-output chunking
  - Embedding wrapper structures
  - Similarity retrieval and scoring
- `src/tai/runbook_store.py`
  - Persistent ChromaDB runbook indexing and querying
- `src/tai/chroma_telemetry.py`
  - No-op telemetry adapter for Chroma local usage
- `src/tai/prompt_builder.py`
  - Prompt assembly for full-context and retrieved-context paths
- `src/tai/ai_client.py`
  - OpenAI-compatible completions and embeddings client
- `src/tai/ai_guardrails.py`
  - Lightweight response guardrails and warnings
- `src/tai/session_log.py`
  - Optional JSONL event logging

## Data Stores

- Runbook store (Tier 2): local ChromaDB path or remote ChromaDB HTTP endpoint (`--runbooks`, `runbooks --store`)
- Run history store (Tier 3): SQLite file/URL or PostgreSQL DSN (`--history-db`)
- Session logs: optional JSONL file configured by `--log-file`

External DB auth can be provided by CLI options or dotenv file (`--env-file`) and is resolved without executing downloaded runbook content.

## Runbook Source Ingestion

`tai runbooks sync --path` and `tai runbooks add` support runbook/playbook source retrieval from:

- local filesystem paths
- SSH URIs (`ssh://...`) via read-only remote fetch (`find`, `cat`)
- HTTP/HTTPS URLs (single `.md` file or webroot index with `.md` links)

Remote source content is materialized into temporary local files, embedded, and then indexed into the target ChromaDB store.

## Retrieval Layers

- Tier 1 (implemented): in-memory semantic retrieval over diagnostic chunks
- Tier 2 (implemented): persistent semantic retrieval over runbook corpus
- Tier 3 (implemented core): persistent retrieval over prior sessions (dedicated UX commands pending)

## Safety Boundaries

Read-only policy is enforced before each remote command execution.

- Allowed command families are explicitly enumerated.
- Shell composition operators are blocked.
- Commands that fail execution are recorded and surfaced to the model as non-evidence.

## Failure and Fallback Behavior

- If RAG indexing fails, analysis falls back to full-context prompts.
- If runbook store is unavailable, analysis proceeds without runbook context.
- If AI call fails, CLI exits with non-zero status and displays an error.

## Test Coverage Highlights

- Planner behavior and service detection
- Prompt formatting and guardrail-sensitive messaging
- CLI command behavior and interactive loop controls
- Runbook store parsing/index/query behavior (with mocked Chroma)
- SSH policy validation and command execution contract