# Architecture This document describes tai's current runtime architecture, module responsibilities, and data flow. ## High-Level Flow 1. User runs `tai run` with issue text and target host settings. 1. CLI validates input and opens a shared SSH session. 1. Probe and collection run against a read-only command plan. 1. Collection output is converted into diagnostic chunks. 1. Optional RAG retrieval selects top-k chunks per question. 1. Optional runbook retrieval selects top-k runbook chunks from ChromaDB. 1. Prompt builder composes system + user message. 1. AI completion returns analysis. 1. Guardrails validate response quality signals. 1. Optional session logger writes JSONL events. ## Module Layout - `src/tai/cli.py` - Command definitions (`run`, `runbooks sync/list/add`) - Orchestration across SSH, collection, RAG, prompts, AI, and logging - `src/tai/input_parser.py` - User input validation and request normalization - `src/tai/models.py` - Core dataclasses (`TroubleshootRequest`) - `src/tai/ssh_client.py` - SSH invocation - Read-only command policy validation - Probe and command execution helpers - `src/tai/plan.py` - Issue keyword/service extraction - Command plan generation - Service/subsystem presence probes (unit files, binaries) - `src/tai/collectors.py` - Executes command plans and builds `CollectionReport` - `src/tai/rag_retriever.py` - Command-output chunking - Embedding wrapper structures - Similarity retrieval and scoring - `src/tai/runbook_store.py` - Persistent ChromaDB runbook indexing and querying - `src/tai/chroma_telemetry.py` - No-op telemetry adapter for Chroma local usage - `src/tai/prompt_builder.py` - Prompt assembly for full-context and retrieved-context paths - `src/tai/ai_client.py` - OpenAI-compatible completions and embeddings client - `src/tai/ai_guardrails.py` - Lightweight response guardrails and warnings - `src/tai/session_log.py` - Optional JSONL event logging ## Data Stores - Runbook store (Tier 2): local ChromaDB path, default `~/.tai/runbooks` - Session logs: optional JSONL file configured by `--log-file` ## Retrieval Layers - Tier 1 (implemented): in-memory semantic retrieval over diagnostic chunks - Tier 2 (implemented): persistent semantic retrieval over runbook corpus - Tier 3 (implemented core): persistent retrieval over prior sessions (dedicated UX commands pending) ## Safety Boundaries Read-only policy is enforced before each remote command execution. - Allowed command families are explicitly enumerated. - Shell composition operators are blocked. - Commands that fail execution are recorded and surfaced to the model as non-evidence. ## Failure and Fallback Behavior - If RAG indexing fails, analysis falls back to full-context prompts. - If runbook store is unavailable, analysis proceeds without runbook context. - If AI call fails, CLI exits with non-zero status and displays an error. ## Test Coverage Highlights - Planner behavior and service detection - Prompt formatting and guardrail-sensitive messaging - CLI command behavior and interactive loop controls - Runbook store parsing/index/query behavior (with mocked Chroma) - SSH policy validation and command execution contract