feat: complete RAG runbook workflow and release docs
Some checks failed
CI / test (push) Failing after 15s

This commit is contained in:
2026-05-06 04:48:41 +02:00
parent 450de24d28
commit 57f4c0efaa
26 changed files with 2510 additions and 137 deletions

View File

@@ -18,10 +18,11 @@ These must be resolved before meaningful development can begin.
### AI Backend & Model
- [ ] Confirm use of [vLLM](https://github.com/vllm-project/vllm) as the inference backend
- [ ] Confirm `gemma4:a4b` as the default model (or select an alternative)
- [x] OpenAI-compatible backend client implemented (`AIClient`)
- [x] Default local backend profile wired for Ollama (`http://localhost:11434/v1`)
- [x] Default model profile set to `gemma3:4b` (override via `--model`)
- [ ] Define minimum hardware requirements for running the model locally
- [ ] Decide whether the AI backend is bundled, self-hosted externally, or user-supplied
- [x] AI backend is user-supplied/self-hosted
### SSH Strategy
@@ -38,7 +39,7 @@ These must be resolved before meaningful development can begin.
### Scope & Constraints
- [ ] Define the supported scope of issues (services, network, disk, kernel, etc.)
- [ ] Confirm read-only guarantee — document exactly what "read-only" means in practice
- [x] Read-only guarantee implemented with command allowlist + blocked shell operator policy
- [x] **Decision: interactive REPL mode for v0.1, full TUI for v0.2+**
- v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
- v0.2+: `textual`-based TUI with split panes (collected data | AI output | input bar)
@@ -52,7 +53,7 @@ Basic project scaffolding and connectivity.
- [x] Finalise repository structure and language toolchain
- [x] Set up CI pipeline (linting, tests)
- [ ] Implement SSH connection module
- [x] Implement SSH connection module
- [x] Define SSH config model and probe interface scaffold
- [x] Connect to remote host
- [x] Execute read-only commands (e.g. `journalctl`, `systemctl status`, `cat`)
@@ -68,15 +69,15 @@ ______________________________________________________________________
Define what information the agent gathers and how.
- [ ] Identify the canonical set of data sources per issue type:
- [x] Identify a baseline canonical set of data sources per issue type:
- Service failures: `journalctl`, `systemctl`, service config files
- Network issues: `ip`, `ss`, `netstat`, firewall rules
- Disk issues: `df`, `du`, `dmesg`, `smartctl`
- General: `/var/log/syslog`, `/var/log/messages`, `dmesg`
- [ ] Implement pluggable "collector" modules per data source
- [ ] Implement directory traversal for user-specified paths (read-only)
- [x] Implement collectors and plan builder for baseline issue categories
- [x] Implement directory traversal for user-specified paths (read-only)
- [ ] Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
- [ ] Write tests with mocked SSH output
- [x] Write tests with mocked SSH output
______________________________________________________________________
@@ -84,12 +85,12 @@ ______________________________________________________________________
Wire collected data into the local AI model.
- [ ] Implement vLLM client module
- [ ] Design prompt template: system context, collected data, issue description → diagnosis
- [ ] Implement response parsing and structured output (root cause + suggested steps)
- [ ] Tune context window usage — handle truncation for large log outputs
- [ ] Add streaming support for long AI responses
- [ ] Evaluate and test model output quality on common issue types
- [x] Implement OpenAI-compatible AI client module
- [x] Design prompt templates for initial and follow-up analysis
- [x] Implement response guardrail checks and structured response headings
- [x] Tune context usage with RAG retrieval and chunk/runbook truncation budgets
- [x] Implement reliable non-streaming completion path for local backends
- [ ] Continue output quality tuning and grounding evaluation on real hosts
______________________________________________________________________
@@ -97,11 +98,11 @@ ______________________________________________________________________
Polish the interface for real-world use.
- [ ] Design CLI interface (flags, subcommands, interactive prompts)
- [ ] Implement structured output: diagnosis, confidence, recommended actions
- [ ] Add `--verbose` / `--debug` mode showing raw collected data
- [x] Design CLI interface with run command, interactive prompts, and runbook subcommands
- [x] Implement structured output sections (Root Cause, Evidence, Recommended Actions)
- [x] Add RAG debug mode (`--rag-debug`) showing retrieval scores
- [ ] Support output to file or clipboard
- [ ] Write man page / `--help` documentation
- [x] Provide comprehensive `--help` command documentation via Typer options
______________________________________________________________________
@@ -135,19 +136,21 @@ model weights alone. Three tiers of increasing capability, each buildable indepe
| Decision | Options | Recommendation | Status |
|---|---|---|---|
| Embedding model | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm` | `nomic-embed-text` via Ollama (local, 274MB, strong perf) | ⬜ Pending |
| Vector store — Tier 1 | In-memory numpy cosine, `faiss-cpu` | numpy (zero deps) for session scope | ⬜ Pending |
| Vector store — Tier 2/3 | `chromadb`, `qdrant`, `weaviate`, `pgvector` | `chromadb` (embedded mode, no server needed) or `qdrant` (self-hosted, REST API, production-grade) | ⬜ Pending |
| Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ⬜ Pending |
| Embedding model | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm` | `nomic-embed-text` via Ollama (local, 274MB, strong perf) | ✅ Implemented |
| Vector store — Tier 1 | In-memory numpy cosine, `faiss-cpu` | numpy (zero deps) for session scope | ✅ Implemented |
| Vector store — Tier 2/3 | `chromadb`, `qdrant`, `weaviate`, `pgvector` | `chromadb` embedded mode | ✅ Tier 2 Implemented |
| Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ✅ Implemented |
| Hybrid retrieval | Semantic only, BM25 only, hybrid | Hybrid (BM25 keyword + cosine semantic) for best recall | ⬜ Pending |
| Reranking | None, cross-encoder (`ms-marco-MiniLM`), LLM-as-judge | Cross-encoder rerank pass before prompt injection | ⬜ Pending |
| Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ⬜ Pending |
| Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ✅ Implemented |
| Session index storage | Local `~/.tai/`, configurable path | `~/.tai/sessions/` with ChromaDB collection | ⬜ Pending |
---
### Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session)
Status: ✅ Implemented
**Problem:** Current flow injects all collected output into the prompt as one block.
On busy hosts this floods the context window with irrelevant output, degrading quality.
@@ -180,6 +183,8 @@ On busy hosts this floods the context window with irrelevant output, degrading q
### Tier 2 — Runbook Knowledge Base (persistent, ChromaDB)
Status: ✅ Implemented
**Problem:** AI improvises remediation steps from training data, which may be wrong for
specific environments, distros, or internal conventions.
@@ -214,6 +219,8 @@ specific environments, distros, or internal conventions.
### Tier 3 — Session Memory Index (institutional learning)
Status: ⬜ Pending
**Problem:** Every session starts from zero. Repeat incidents on the same host or
same issue type get no benefit from past work.