feat: complete RAG runbook workflow and release docs

2026-05-06 04:48:41 +02:00
parent 450de24d28
commit 57f4c0efaa
26 changed files with 2510 additions and 137 deletions
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,85 @@
+# Architecture
+
+This document describes tai's current runtime architecture, module responsibilities, and data flow.
+
+## High-Level Flow
+
+1. User runs `tai run` with issue text and target host settings.
+1. CLI validates input and opens a shared SSH session.
+1. Probe and collection run against a read-only command plan.
+1. Collection output is converted into diagnostic chunks.
+1. Optional RAG retrieval selects top-k chunks per question.
+1. Optional runbook retrieval selects top-k runbook chunks from ChromaDB.
+1. Prompt builder composes system + user message.
+1. AI completion returns analysis.
+1. Guardrails validate response quality signals.
+1. Optional session logger writes JSONL events.
+
+## Module Layout
+
+- `src/tai/cli.py`
+  - Command definitions (`run`, `runbooks sync/list/add`)
+  - Orchestration across SSH, collection, RAG, prompts, AI, and logging
+- `src/tai/input_parser.py`
+  - User input validation and request normalization
+- `src/tai/models.py`
+  - Core dataclasses (`TroubleshootRequest`)
+- `src/tai/ssh_client.py`
+  - SSH invocation
+  - Read-only command policy validation
+  - Probe and command execution helpers
+- `src/tai/plan.py`
+  - Issue keyword/service extraction
+  - Command plan generation
+  - Service/subsystem presence probes (unit files, binaries)
+- `src/tai/collectors.py`
+  - Executes command plans and builds `CollectionReport`
+- `src/tai/rag_retriever.py`
+  - Command-output chunking
+  - Embedding wrapper structures
+  - Similarity retrieval and scoring
+- `src/tai/runbook_store.py`
+  - Persistent ChromaDB runbook indexing and querying
+- `src/tai/chroma_telemetry.py`
+  - No-op telemetry adapter for Chroma local usage
+- `src/tai/prompt_builder.py`
+  - Prompt assembly for full-context and retrieved-context paths
+- `src/tai/ai_client.py`
+  - OpenAI-compatible completions and embeddings client
+- `src/tai/ai_guardrails.py`
+  - Lightweight response guardrails and warnings
+- `src/tai/session_log.py`
+  - Optional JSONL event logging
+
+## Data Stores
+
+- Runbook store (Tier 2): local ChromaDB path, default `~/.tai/runbooks`
+- Session logs: optional JSONL file configured by `--log-file`
+
+## Retrieval Layers
+
+- Tier 1 (implemented): in-memory semantic retrieval over diagnostic chunks
+- Tier 2 (implemented): persistent semantic retrieval over runbook corpus
+- Tier 3 (pending): persistent retrieval over prior sessions
+
+## Safety Boundaries
+
+Read-only policy is enforced before each remote command execution.
+
+- Allowed command families are explicitly enumerated.
+- Shell composition operators are blocked.
+- Commands that fail execution are recorded and surfaced to the model as non-evidence.
+
+## Failure and Fallback Behavior
+
+- If RAG indexing fails, analysis falls back to full-context prompts.
+- If runbook store is unavailable, analysis proceeds without runbook context.
+- If AI call fails, CLI exits with non-zero status and displays an error.
+
+## Test Coverage Highlights
+
+- Planner behavior and service detection
+- Prompt formatting and guardrail-sensitive messaging
+- CLI command behavior and interactive loop controls
+- Runbook store parsing/index/query behavior (with mocked Chroma)
+- SSH policy validation and command execution contract