feat: complete RAG runbook workflow and release docs
Some checks failed
CI / test (push) Failing after 15s
Some checks failed
CI / test (push) Failing after 15s
This commit is contained in:
55
ROADMAP.md
55
ROADMAP.md
@@ -18,10 +18,11 @@ These must be resolved before meaningful development can begin.
|
||||
|
||||
### AI Backend & Model
|
||||
|
||||
- [ ] Confirm use of [vLLM](https://github.com/vllm-project/vllm) as the inference backend
|
||||
- [ ] Confirm `gemma4:a4b` as the default model (or select an alternative)
|
||||
- [x] OpenAI-compatible backend client implemented (`AIClient`)
|
||||
- [x] Default local backend profile wired for Ollama (`http://localhost:11434/v1`)
|
||||
- [x] Default model profile set to `gemma3:4b` (override via `--model`)
|
||||
- [ ] Define minimum hardware requirements for running the model locally
|
||||
- [ ] Decide whether the AI backend is bundled, self-hosted externally, or user-supplied
|
||||
- [x] AI backend is user-supplied/self-hosted
|
||||
|
||||
### SSH Strategy
|
||||
|
||||
@@ -38,7 +39,7 @@ These must be resolved before meaningful development can begin.
|
||||
### Scope & Constraints
|
||||
|
||||
- [ ] Define the supported scope of issues (services, network, disk, kernel, etc.)
|
||||
- [ ] Confirm read-only guarantee — document exactly what "read-only" means in practice
|
||||
- [x] Read-only guarantee implemented with command allowlist + blocked shell operator policy
|
||||
- [x] **Decision: interactive REPL mode for v0.1, full TUI for v0.2+**
|
||||
- v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
|
||||
- v0.2+: `textual`-based TUI with split panes (collected data | AI output | input bar)
|
||||
@@ -52,7 +53,7 @@ Basic project scaffolding and connectivity.
|
||||
|
||||
- [x] Finalise repository structure and language toolchain
|
||||
- [x] Set up CI pipeline (linting, tests)
|
||||
- [ ] Implement SSH connection module
|
||||
- [x] Implement SSH connection module
|
||||
- [x] Define SSH config model and probe interface scaffold
|
||||
- [x] Connect to remote host
|
||||
- [x] Execute read-only commands (e.g. `journalctl`, `systemctl status`, `cat`)
|
||||
@@ -68,15 +69,15 @@ ______________________________________________________________________
|
||||
|
||||
Define what information the agent gathers and how.
|
||||
|
||||
- [ ] Identify the canonical set of data sources per issue type:
|
||||
- [x] Identify a baseline canonical set of data sources per issue type:
|
||||
- Service failures: `journalctl`, `systemctl`, service config files
|
||||
- Network issues: `ip`, `ss`, `netstat`, firewall rules
|
||||
- Disk issues: `df`, `du`, `dmesg`, `smartctl`
|
||||
- General: `/var/log/syslog`, `/var/log/messages`, `dmesg`
|
||||
- [ ] Implement pluggable "collector" modules per data source
|
||||
- [ ] Implement directory traversal for user-specified paths (read-only)
|
||||
- [x] Implement collectors and plan builder for baseline issue categories
|
||||
- [x] Implement directory traversal for user-specified paths (read-only)
|
||||
- [ ] Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
|
||||
- [ ] Write tests with mocked SSH output
|
||||
- [x] Write tests with mocked SSH output
|
||||
|
||||
______________________________________________________________________
|
||||
|
||||
@@ -84,12 +85,12 @@ ______________________________________________________________________
|
||||
|
||||
Wire collected data into the local AI model.
|
||||
|
||||
- [ ] Implement vLLM client module
|
||||
- [ ] Design prompt template: system context, collected data, issue description → diagnosis
|
||||
- [ ] Implement response parsing and structured output (root cause + suggested steps)
|
||||
- [ ] Tune context window usage — handle truncation for large log outputs
|
||||
- [ ] Add streaming support for long AI responses
|
||||
- [ ] Evaluate and test model output quality on common issue types
|
||||
- [x] Implement OpenAI-compatible AI client module
|
||||
- [x] Design prompt templates for initial and follow-up analysis
|
||||
- [x] Implement response guardrail checks and structured response headings
|
||||
- [x] Tune context usage with RAG retrieval and chunk/runbook truncation budgets
|
||||
- [x] Implement reliable non-streaming completion path for local backends
|
||||
- [ ] Continue output quality tuning and grounding evaluation on real hosts
|
||||
|
||||
______________________________________________________________________
|
||||
|
||||
@@ -97,11 +98,11 @@ ______________________________________________________________________
|
||||
|
||||
Polish the interface for real-world use.
|
||||
|
||||
- [ ] Design CLI interface (flags, subcommands, interactive prompts)
|
||||
- [ ] Implement structured output: diagnosis, confidence, recommended actions
|
||||
- [ ] Add `--verbose` / `--debug` mode showing raw collected data
|
||||
- [x] Design CLI interface with run command, interactive prompts, and runbook subcommands
|
||||
- [x] Implement structured output sections (Root Cause, Evidence, Recommended Actions)
|
||||
- [x] Add RAG debug mode (`--rag-debug`) showing retrieval scores
|
||||
- [ ] Support output to file or clipboard
|
||||
- [ ] Write man page / `--help` documentation
|
||||
- [x] Provide comprehensive `--help` command documentation via Typer options
|
||||
|
||||
______________________________________________________________________
|
||||
|
||||
@@ -135,19 +136,21 @@ model weights alone. Three tiers of increasing capability, each buildable indepe
|
||||
|
||||
| Decision | Options | Recommendation | Status |
|
||||
|---|---|---|---|
|
||||
| Embedding model | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm` | `nomic-embed-text` via Ollama (local, 274MB, strong perf) | ⬜ Pending |
|
||||
| Vector store — Tier 1 | In-memory numpy cosine, `faiss-cpu` | numpy (zero deps) for session scope | ⬜ Pending |
|
||||
| Vector store — Tier 2/3 | `chromadb`, `qdrant`, `weaviate`, `pgvector` | `chromadb` (embedded mode, no server needed) or `qdrant` (self-hosted, REST API, production-grade) | ⬜ Pending |
|
||||
| Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ⬜ Pending |
|
||||
| Embedding model | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm` | `nomic-embed-text` via Ollama (local, 274MB, strong perf) | ✅ Implemented |
|
||||
| Vector store — Tier 1 | In-memory numpy cosine, `faiss-cpu` | numpy (zero deps) for session scope | ✅ Implemented |
|
||||
| Vector store — Tier 2/3 | `chromadb`, `qdrant`, `weaviate`, `pgvector` | `chromadb` embedded mode | ✅ Tier 2 Implemented |
|
||||
| Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ✅ Implemented |
|
||||
| Hybrid retrieval | Semantic only, BM25 only, hybrid | Hybrid (BM25 keyword + cosine semantic) for best recall | ⬜ Pending |
|
||||
| Reranking | None, cross-encoder (`ms-marco-MiniLM`), LLM-as-judge | Cross-encoder rerank pass before prompt injection | ⬜ Pending |
|
||||
| Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ⬜ Pending |
|
||||
| Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ✅ Implemented |
|
||||
| Session index storage | Local `~/.tai/`, configurable path | `~/.tai/sessions/` with ChromaDB collection | ⬜ Pending |
|
||||
|
||||
---
|
||||
|
||||
### Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session)
|
||||
|
||||
Status: ✅ Implemented
|
||||
|
||||
**Problem:** Current flow injects all collected output into the prompt as one block.
|
||||
On busy hosts this floods the context window with irrelevant output, degrading quality.
|
||||
|
||||
@@ -180,6 +183,8 @@ On busy hosts this floods the context window with irrelevant output, degrading q
|
||||
|
||||
### Tier 2 — Runbook Knowledge Base (persistent, ChromaDB)
|
||||
|
||||
Status: ✅ Implemented
|
||||
|
||||
**Problem:** AI improvises remediation steps from training data, which may be wrong for
|
||||
specific environments, distros, or internal conventions.
|
||||
|
||||
@@ -214,6 +219,8 @@ specific environments, distros, or internal conventions.
|
||||
|
||||
### Tier 3 — Session Memory Index (institutional learning)
|
||||
|
||||
Status: ⬜ Pending
|
||||
|
||||
**Problem:** Every session starts from zero. Repeat incidents on the same host or
|
||||
same issue type get no benefit from past work.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user