feat: complete RAG runbook workflow and release docs
Some checks failed
CI / test (push) Failing after 15s
Some checks failed
CI / test (push) Failing after 15s
This commit is contained in:
85
docs/ARCHITECTURE.md
Normal file
85
docs/ARCHITECTURE.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Architecture
|
||||
|
||||
This document describes tai's current runtime architecture, module responsibilities, and data flow.
|
||||
|
||||
## High-Level Flow
|
||||
|
||||
1. User runs `tai run` with issue text and target host settings.
|
||||
1. CLI validates input and opens a shared SSH session.
|
||||
1. Probe and collection run against a read-only command plan.
|
||||
1. Collection output is converted into diagnostic chunks.
|
||||
1. Optional RAG retrieval selects top-k chunks per question.
|
||||
1. Optional runbook retrieval selects top-k runbook chunks from ChromaDB.
|
||||
1. Prompt builder composes system + user message.
|
||||
1. AI completion returns analysis.
|
||||
1. Guardrails validate response quality signals.
|
||||
1. Optional session logger writes JSONL events.
|
||||
|
||||
## Module Layout
|
||||
|
||||
- `src/tai/cli.py`
|
||||
- Command definitions (`run`, `runbooks sync/list/add`)
|
||||
- Orchestration across SSH, collection, RAG, prompts, AI, and logging
|
||||
- `src/tai/input_parser.py`
|
||||
- User input validation and request normalization
|
||||
- `src/tai/models.py`
|
||||
- Core dataclasses (`TroubleshootRequest`)
|
||||
- `src/tai/ssh_client.py`
|
||||
- SSH invocation
|
||||
- Read-only command policy validation
|
||||
- Probe and command execution helpers
|
||||
- `src/tai/plan.py`
|
||||
- Issue keyword/service extraction
|
||||
- Command plan generation
|
||||
- Service/subsystem presence probes (unit files, binaries)
|
||||
- `src/tai/collectors.py`
|
||||
- Executes command plans and builds `CollectionReport`
|
||||
- `src/tai/rag_retriever.py`
|
||||
- Command-output chunking
|
||||
- Embedding wrapper structures
|
||||
- Similarity retrieval and scoring
|
||||
- `src/tai/runbook_store.py`
|
||||
- Persistent ChromaDB runbook indexing and querying
|
||||
- `src/tai/chroma_telemetry.py`
|
||||
- No-op telemetry adapter for Chroma local usage
|
||||
- `src/tai/prompt_builder.py`
|
||||
- Prompt assembly for full-context and retrieved-context paths
|
||||
- `src/tai/ai_client.py`
|
||||
- OpenAI-compatible completions and embeddings client
|
||||
- `src/tai/ai_guardrails.py`
|
||||
- Lightweight response guardrails and warnings
|
||||
- `src/tai/session_log.py`
|
||||
- Optional JSONL event logging
|
||||
|
||||
## Data Stores
|
||||
|
||||
- Runbook store (Tier 2): local ChromaDB path, default `~/.tai/runbooks`
|
||||
- Session logs: optional JSONL file configured by `--log-file`
|
||||
|
||||
## Retrieval Layers
|
||||
|
||||
- Tier 1 (implemented): in-memory semantic retrieval over diagnostic chunks
|
||||
- Tier 2 (implemented): persistent semantic retrieval over runbook corpus
|
||||
- Tier 3 (pending): persistent retrieval over prior sessions
|
||||
|
||||
## Safety Boundaries
|
||||
|
||||
Read-only policy is enforced before each remote command execution.
|
||||
|
||||
- Allowed command families are explicitly enumerated.
|
||||
- Shell composition operators are blocked.
|
||||
- Commands that fail execution are recorded and surfaced to the model as non-evidence.
|
||||
|
||||
## Failure and Fallback Behavior
|
||||
|
||||
- If RAG indexing fails, analysis falls back to full-context prompts.
|
||||
- If runbook store is unavailable, analysis proceeds without runbook context.
|
||||
- If AI call fails, CLI exits with non-zero status and displays an error.
|
||||
|
||||
## Test Coverage Highlights
|
||||
|
||||
- Planner behavior and service detection
|
||||
- Prompt formatting and guardrail-sensitive messaging
|
||||
- CLI command behavior and interactive loop controls
|
||||
- Runbook store parsing/index/query behavior (with mocked Chroma)
|
||||
- SSH policy validation and command execution contract
|
||||
Reference in New Issue
Block a user