feat: complete RAG runbook workflow and release docs

2026-05-06 04:48:41 +02:00
parent 450de24d28
commit 57f4c0efaa
26 changed files with 2510 additions and 137 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -18,10 +18,11 @@ These must be resolved before meaningful development can begin.

 ### AI Backend & Model

- [ ] Confirm use of [vLLM](https://github.com/vllm-project/vllm) as the inference backend
- [ ] Confirm `gemma4:a4b` as the default model (or select an alternative)
+- [x] OpenAI-compatible backend client implemented (`AIClient`)
+- [x] Default local backend profile wired for Ollama (`http://localhost:11434/v1`)
+- [x] Default model profile set to `gemma3:4b` (override via `--model`)
 - [ ] Define minimum hardware requirements for running the model locally
- [ ] Decide whether the AI backend is bundled, self-hosted externally, or user-supplied
+- [x] AI backend is user-supplied/self-hosted

 ### SSH Strategy

@@ -38,7 +39,7 @@ These must be resolved before meaningful development can begin.
 ### Scope & Constraints

 - [ ] Define the supported scope of issues (services, network, disk, kernel, etc.)
- [ ] Confirm read-only guarantee — document exactly what "read-only" means in practice
+- [x] Read-only guarantee implemented with command allowlist + blocked shell operator policy
 - [x] **Decision: interactive REPL mode for v0.1, full TUI for v0.2+**
  - v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
  - v0.2+: `textual`-based TUI with split panes (collected data | AI output | input bar)
@@ -52,7 +53,7 @@ Basic project scaffolding and connectivity.

 - [x] Finalise repository structure and language toolchain
 - [x] Set up CI pipeline (linting, tests)
- [ ] Implement SSH connection module
+- [x] Implement SSH connection module
  - [x] Define SSH config model and probe interface scaffold
  - [x] Connect to remote host
  - [x] Execute read-only commands (e.g. `journalctl`, `systemctl status`, `cat`)
@@ -68,15 +69,15 @@ ______________________________________________________________________

 Define what information the agent gathers and how.

- [ ] Identify the canonical set of data sources per issue type:
+- [x] Identify a baseline canonical set of data sources per issue type:
  - Service failures: `journalctl`, `systemctl`, service config files
  - Network issues: `ip`, `ss`, `netstat`, firewall rules
  - Disk issues: `df`, `du`, `dmesg`, `smartctl`
  - General: `/var/log/syslog`, `/var/log/messages`, `dmesg`
- [ ] Implement pluggable "collector" modules per data source
- [ ] Implement directory traversal for user-specified paths (read-only)
+- [x] Implement collectors and plan builder for baseline issue categories
+- [x] Implement directory traversal for user-specified paths (read-only)
 - [ ] Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
- [ ] Write tests with mocked SSH output
+- [x] Write tests with mocked SSH output

 ______________________________________________________________________

@@ -84,12 +85,12 @@ ______________________________________________________________________

 Wire collected data into the local AI model.

- [ ] Implement vLLM client module
- [ ] Design prompt template: system context, collected data, issue description → diagnosis
- [ ] Implement response parsing and structured output (root cause + suggested steps)
- [ ] Tune context window usage — handle truncation for large log outputs
- [ ] Add streaming support for long AI responses
- [ ] Evaluate and test model output quality on common issue types
+- [x] Implement OpenAI-compatible AI client module
+- [x] Design prompt templates for initial and follow-up analysis
+- [x] Implement response guardrail checks and structured response headings
+- [x] Tune context usage with RAG retrieval and chunk/runbook truncation budgets
+- [x] Implement reliable non-streaming completion path for local backends
+- [ ] Continue output quality tuning and grounding evaluation on real hosts

 ______________________________________________________________________

@@ -97,11 +98,11 @@ ______________________________________________________________________

 Polish the interface for real-world use.

- [ ] Design CLI interface (flags, subcommands, interactive prompts)
- [ ] Implement structured output: diagnosis, confidence, recommended actions
- [ ] Add `--verbose` / `--debug` mode showing raw collected data
+- [x] Design CLI interface with run command, interactive prompts, and runbook subcommands
+- [x] Implement structured output sections (Root Cause, Evidence, Recommended Actions)
+- [x] Add RAG debug mode (`--rag-debug`) showing retrieval scores
 - [ ] Support output to file or clipboard
- [ ] Write man page / `--help` documentation
+- [x] Provide comprehensive `--help` command documentation via Typer options

 ______________________________________________________________________

@@ -135,19 +136,21 @@ model weights alone. Three tiers of increasing capability, each buildable indepe

 | Decision | Options | Recommendation | Status |
 |---|---|---|---|
-| Embedding model | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm` | `nomic-embed-text` via Ollama (local, 274MB, strong perf) | ⬜ Pending |
-| Vector store — Tier 1 | In-memory numpy cosine, `faiss-cpu` | numpy (zero deps) for session scope | ⬜ Pending |
-| Vector store — Tier 2/3 | `chromadb`, `qdrant`, `weaviate`, `pgvector` | `chromadb` (embedded mode, no server needed) or `qdrant` (self-hosted, REST API, production-grade) | ⬜ Pending |
-| Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ⬜ Pending |
+| Embedding model | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm` | `nomic-embed-text` via Ollama (local, 274MB, strong perf) | ✅ Implemented |
+| Vector store — Tier 1 | In-memory numpy cosine, `faiss-cpu` | numpy (zero deps) for session scope | ✅ Implemented |
+| Vector store — Tier 2/3 | `chromadb`, `qdrant`, `weaviate`, `pgvector` | `chromadb` embedded mode | ✅ Tier 2 Implemented |
+| Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ✅ Implemented |
 | Hybrid retrieval | Semantic only, BM25 only, hybrid | Hybrid (BM25 keyword + cosine semantic) for best recall | ⬜ Pending |
 | Reranking | None, cross-encoder (`ms-marco-MiniLM`), LLM-as-judge | Cross-encoder rerank pass before prompt injection | ⬜ Pending |
-| Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ⬜ Pending |
+| Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ✅ Implemented |
 | Session index storage | Local `~/.tai/`, configurable path | `~/.tai/sessions/` with ChromaDB collection | ⬜ Pending |

 ---

 ### Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session)

+Status: ✅ Implemented
+
 **Problem:** Current flow injects all collected output into the prompt as one block.
 On busy hosts this floods the context window with irrelevant output, degrading quality.

@@ -180,6 +183,8 @@ On busy hosts this floods the context window with irrelevant output, degrading q

 ### Tier 2 — Runbook Knowledge Base (persistent, ChromaDB)

+Status: ✅ Implemented
+
 **Problem:** AI improvises remediation steps from training data, which may be wrong for
 specific environments, distros, or internal conventions.

@@ -214,6 +219,8 @@ specific environments, distros, or internal conventions.

 ### Tier 3 — Session Memory Index (institutional learning)

+Status: ⬜ Pending
+
 **Problem:** Every session starts from zero. Repeat incidents on the same host or
 same issue type get no benefit from past work.