From e49670a6642e11839731e96e6b196412e0ce858b Mon Sep 17 00:00:00 2001 From: zphinx Date: Mon, 4 May 2026 18:23:33 +0200 Subject: [PATCH] docs(roadmap): add Phase 6 RAG & Knowledge Layer plan - Three-tier RAG architecture: diagnostic chunks, runbook KB, session memory - Technology decisions table with options and recommendations - Per-tier: approach, new modules, changes to existing code, companion features - Implementation order and effort estimates - New dependencies and optional pyproject.toml group - Decisions log entries for RAG choices pending confirmation --- ROADMAP.md | 169 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 169 insertions(+) diff --git a/ROADMAP.md b/ROADMAP.md index c20e242..208ae12 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -117,6 +117,170 @@ Prepare for broader use. ______________________________________________________________________ +## Phase 6 — RAG & Knowledge Layer + +Introduce Retrieval-Augmented Generation to ground AI responses in evidence rather than +model weights alone. Three tiers of increasing capability, each buildable independently. + +### Goals + +- Eliminate prompt flooding on hosts with large log output +- Ground recommendations in version-controlled runbooks, not model improvisation +- Build compounding institutional memory from past troubleshooting sessions +- Keep all data local — no embeddings or session content leaves the network + +--- + +### Technology Decisions Required + +| Decision | Options | Recommendation | Status | +|---|---|---|---| +| Embedding model | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm` | `nomic-embed-text` via Ollama (local, 274MB, strong perf) | ⬜ Pending | +| Vector store — Tier 1 | In-memory numpy cosine, `faiss-cpu` | numpy (zero deps) for session scope | ⬜ Pending | +| Vector store — Tier 2/3 | `chromadb`, `qdrant`, `weaviate`, `pgvector` | `chromadb` (embedded mode, no server needed) or `qdrant` (self-hosted, REST API, production-grade) | ⬜ Pending | +| Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ⬜ Pending | +| Hybrid retrieval | Semantic only, BM25 only, hybrid | Hybrid (BM25 keyword + cosine semantic) for best recall | ⬜ Pending | +| Reranking | None, cross-encoder (`ms-marco-MiniLM`), LLM-as-judge | Cross-encoder rerank pass before prompt injection | ⬜ Pending | +| Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ⬜ Pending | +| Session index storage | Local `~/.tai/`, configurable path | `~/.tai/sessions/` with ChromaDB collection | ⬜ Pending | + +--- + +### Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session) + +**Problem:** Current flow injects all collected output into the prompt as one block. +On busy hosts this floods the context window with irrelevant output, degrading quality. + +**Approach:** +- After collection, split each command's output into overlapping token chunks (e.g. 512 tokens, 64 overlap) +- Embed all chunks using `nomic-embed-text` via Ollama embeddings API +- On each question (initial + follow-up), embed the question and retrieve top-k chunks by cosine similarity +- Inject only retrieved chunks into the prompt, not the full dump + +**New module:** `src/tai/rag_retriever.py` +- `chunk_report(report) -> list[Chunk]` +- `embed_chunks(chunks) -> list[EmbeddedChunk]` +- `retrieve(question, embedded_chunks, top_k) -> list[Chunk]` + +**Changes to existing code:** +- `prompt_builder.py`: accept `retrieved_chunks` instead of full `CollectionReport` for RAG-mode prompts +- `cli.py`: embed report after collection, pass retriever to `_run_analysis` and `_run_followup_analysis` +- `ai_client.py`: add `embed(text) -> list[float]` method using Ollama `/api/embeddings` + +**Companion features buildable at same time:** +- `--no-rag` flag to bypass retrieval and use full dump (backwards compat) +- Token budget display: show user how many tokens are being sent vs. saved +- Per-chunk source attribution in AI response (which command produced the evidence) + +**Tests:** +- `tests/test_rag_retriever.py`: chunk splitting, cosine similarity ranking, top-k retrieval +- `tests/test_ai.py`: add `test_embed_returns_float_list()` + +--- + +### Tier 2 — Runbook Knowledge Base (persistent, ChromaDB) + +**Problem:** AI improvises remediation steps from training data, which may be wrong for +specific environments, distros, or internal conventions. + +**Approach:** +- Maintain a version-controlled corpus of Markdown runbooks in `runbooks/` directory +- On first run (or `tai runbooks --sync`), embed all runbooks and persist to ChromaDB collection +- On each analysis, retrieve top-3 relevant runbook chunks alongside diagnostic chunks +- Inject as a separate `## Runbook Context` section in the prompt + +**New module:** `src/tai/runbook_store.py` +- `RunbookStore`: wraps ChromaDB collection +- `sync(runbooks_dir) -> int` — embed and upsert all runbooks +- `query(question, top_k) -> list[RunbookChunk]` + +**New directory:** `runbooks/` +- `ssh.md`, `nginx.md`, `postgres.md`, `disk.md`, `kernel.md`, etc. +- Each runbook: YAML frontmatter (`service`, `symptoms`, `tags`) + Markdown body + +**New CLI command:** `tai runbooks --sync [--path ./runbooks]` + +**Changes to existing code:** +- `prompt_builder.py`: add `build_message_with_runbooks(retrieved_chunks, runbook_chunks)` +- `cli.py`: optionally load `RunbookStore`, query it per analysis turn + +**Companion features buildable at same time:** +- `tai runbooks --list` — show indexed runbooks and last sync time +- `tai runbooks --add ` — index a single runbook +- `/runbooks` slash command in interactive mode — show which runbooks were retrieved +- Runbook citation in AI output: "Based on runbook: `ssh.md#AuthenticationFailures`" + +--- + +### Tier 3 — Session Memory Index (institutional learning) + +**Problem:** Every session starts from zero. Repeat incidents on the same host or +same issue type get no benefit from past work. + +**Approach:** +- On session end, embed the session summary (issue + root cause + actions) and upsert into a persistent ChromaDB collection (`~/.tai/sessions/`) +- On session start, query for similar past sessions by issue text + hostname +- Inject top-2 past sessions as `## Prior Sessions` context +- Optionally: `/history` command in interactive mode to surface past sessions explicitly + +**New module:** `src/tai/session_store.py` +- `SessionStore`: wraps ChromaDB collection at `~/.tai/sessions/` +- `index_session(session_log_path)` — embed and store completed session +- `query_similar(issue, host, top_k) -> list[PastSession]` + +**Changes to existing code:** +- `session_log.py`: add `summarise() -> str` method (issue + final AI response) +- `cli.py`: query `SessionStore` at session start, index at session end + +**Companion features buildable at same time:** +- `tai history` CLI subcommand — search past sessions by keyword +- `tai history --host ` — all sessions for a host +- `tai history --export ` — export session summaries as Markdown report +- Auto-suggest: "Similar issue found from 2 weeks ago — load context? [y/N]" + +--- + +### Implementation Order + +``` +Tier 1 (diagnostic chunks) ← Start here. Zero new infra. Immediate prompt quality gain. + ↓ +Tier 2 (runbook KB) ← After Tier 1. Requires ChromaDB dep + runbook authoring. + ↓ +Tier 3 (session memory) ← Builds on Tier 2 infrastructure. Minimal extra work. +``` + +**Estimated effort:** +- Tier 1: 2–3 days (new module + prompt builder changes + tests) +- Tier 2: 3–4 days (ChromaDB + runbook authoring + CLI command + tests) +- Tier 3: 1–2 days (reuses Tier 2 infrastructure) + +### New Dependencies + +``` +# Tier 1 (zero new runtime deps — uses Ollama HTTP API already in use) +# No additions needed + +# Tier 2 + 3 +chromadb>=0.5,<1.0 # embedded vector store, no separate server +# OR +qdrant-client>=1.9,<2.0 # if self-hosted Qdrant preferred + +sentence-transformers>=3.0 # optional: cross-encoder reranking +``` + +### New pyproject.toml optional group + +```toml +[project.optional-dependencies] +rag = [ + "chromadb>=0.5,<1.0", + "sentence-transformers>=3.0,<4.0", +] +``` + +______________________________________________________________________ + ## Decisions Log | Date | Decision | Outcome | @@ -128,3 +292,8 @@ ______________________________________________________________________ | 2026-05-04 | Bastion host support | `--jump-host` flag via SSH native ProxyJump | | 2026-05-04 | SSH config behavior | Use `~/.ssh/config` by default; allow override via `--ignore-ssh-config` | | 2026-05-04 | CLI vs interactive mode | Interactive: REPL for v0.1, `textual` TUI for v0.2+ | +| 2026-05-04 | RAG embedding model | `nomic-embed-text` via Ollama (local, air-gapped safe) — ⬜ pending confirmation | +| 2026-05-04 | RAG vector store (Tier 1) | In-memory numpy cosine similarity — zero deps, session-scoped | +| 2026-05-04 | RAG vector store (Tier 2/3) | `chromadb` embedded mode (default) or `qdrant` self-hosted — ⬜ pending confirmation | +| 2026-05-04 | RAG chunking unit | Command-boundary splitting — each collected command = one or more chunks | +| 2026-05-04 | Runbook format | Markdown with YAML frontmatter, version-controlled in `runbooks/` directory |