# Roadmap This document outlines the major decisions, milestones, and development phases required to bring `tai` from concept to a working tool. ______________________________________________________________________ ## Phase 0 — Decisions & Prerequisites These must be resolved before meaningful development can begin. ### Language Selection - [x] **Decision: Python** - Key factors: native vLLM integration, mature SSH libraries (`paramiko` / `asyncssh`), strong text/log parsing, rapid development - Single binary distribution will be achieved via **Nuitka** (preferred for true compilation) or **PyInstaller** as a fallback - [ ] Evaluate Nuitka vs PyInstaller for binary output quality and CI reproducibility - [ ] Add binary build step to CI pipeline ### AI Backend & Model - [x] OpenAI-compatible backend client implemented (`AIClient`) - [x] Default local backend profile wired for Ollama (`http://localhost:11434/v1`) - [x] Default model profile set to `gemma3:4b` (override via `--model`) - [ ] Define minimum hardware requirements for running the model locally - [x] AI backend is user-supplied/self-hosted ### SSH Strategy - [x] **Decision: keypair authentication only** — no password auth; eliminates credential storage risk - Default key resolution: `~/.ssh/id_ed25519`, `~/.ssh/id_rsa` (in order of preference) - CLI override via `--identity-file ` - No SSH agent forwarding needed — a shared key is distributed to all managed hosts via Puppet - [x] **Known hosts: auto-accept new hosts; reject on key mismatch** — a changed host key triggers a hard stop with a MITM warning; unknown/new hosts are accepted silently on first connect - [x] **Bastion/jump host: `--jump-host ` flag** — delegates to SSH's native ProxyJump functionality - [x] **SSH config behavior: respect existing `~/.ssh/config` by default; allow CLI override** - Default: follow host settings from `~/.ssh/config` (for `User`, `Port`, `ProxyJump`, etc.) - Override switch: `--ignore-ssh-config` to bypass local SSH config when required ### Scope & Constraints - [ ] Define the supported scope of issues (services, network, disk, kernel, etc.) - [x] Read-only guarantee implemented with command allowlist + blocked shell operator policy - [x] **Decision: interactive REPL mode for v0.1, full TUI for v0.2+** - v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent - v0.2+: `textual`-based TUI with split panes (collected data | AI output | input bar) - Built-in slash commands: `/collect`, `/show logs`, `/clear`, `/host `, `/help`, `/quit` ______________________________________________________________________ ## Phase 1 — Project Foundation Basic project scaffolding and connectivity. - [x] Finalise repository structure and language toolchain - [x] Set up CI pipeline (linting, tests) - [x] Implement SSH connection module - [x] Define SSH config model and probe interface scaffold - [x] Connect to remote host - [x] Execute read-only commands (e.g. `journalctl`, `systemctl status`, `cat`) - [x] Stream or collect command output safely (byte-limited output with truncation marker) - [x] Implement basic input parsing (ticket text, hostname, target directories) - [x] Write unit tests for SSH and input modules - [x] Input parser and CLI tests added - [x] SSH module tests added for command policy and SSH argv behavior ______________________________________________________________________ ## Phase 2 — Data Collection Layer Define what information the agent gathers and how. - [x] Identify a baseline canonical set of data sources per issue type: - Service failures: `journalctl`, `systemctl`, service config files - Network issues: `ip`, `ss`, `netstat`, firewall rules - Disk issues: `df`, `du`, `dmesg`, `smartctl` - General: `/var/log/syslog`, `/var/log/messages`, `dmesg` - [x] Implement collectors and plan builder for baseline issue categories - [x] Implement directory traversal for user-specified paths (read-only) - [ ] Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.) - [x] Write tests with mocked SSH output ______________________________________________________________________ ## Phase 3 — AI Integration Wire collected data into the local AI model. - [x] Implement OpenAI-compatible AI client module - [x] Design prompt templates for initial and follow-up analysis - [x] Implement response guardrail checks and structured response headings - [x] Tune context usage with RAG retrieval and chunk/runbook truncation budgets - [x] Implement reliable non-streaming completion path for local backends - [ ] Continue output quality tuning and grounding evaluation on real hosts ______________________________________________________________________ ## Phase 4 — CLI & User Experience Polish the interface for real-world use. - [x] Design CLI interface with run command, interactive prompts, and runbook subcommands - [x] Implement structured output sections (Root Cause, Evidence, Recommended Actions) - [x] Add RAG debug mode (`--rag-debug`) showing retrieval scores - [ ] Support output to file or clipboard - [x] Provide comprehensive `--help` command documentation via Typer options ______________________________________________________________________ ## Phase 5 — Hardening & Distribution Prepare for broader use. - [ ] Security review of SSH handling and credential storage - [ ] Ensure no data is written to the remote system under any path - [ ] Package for distribution (binary release, container image, or distro packages) - [ ] Write installation and quickstart documentation - [ ] End-to-end integration tests against a test VM ______________________________________________________________________ ## Phase 6 — RAG & Knowledge Layer Introduce Retrieval-Augmented Generation to ground AI responses in evidence rather than model weights alone. Three tiers of increasing capability, each buildable independently. ### Goals - Eliminate prompt flooding on hosts with large log output - Ground recommendations in version-controlled runbooks, not model improvisation - Build compounding institutional memory from past troubleshooting sessions - Keep all data local — no embeddings or session content leaves the network ______________________________________________________________________ ### Technology Decisions Required | Decision | Options | Recommendation | Status | |---|---|---|---| | Embedding model | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm` | `nomic-embed-text` via Ollama (local, 274MB, strong perf) | ✅ Implemented | | Vector store — Tier 1 | In-memory numpy cosine, `faiss-cpu` | numpy (zero deps) for session scope | ✅ Implemented | | Vector store — Tier 2/3 | `chromadb`, `qdrant`, `weaviate`, `pgvector` | `chromadb` embedded mode | ✅ Tier 2 Implemented | | Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ✅ Implemented | | Hybrid retrieval | Semantic only, BM25 only, hybrid | Hybrid (BM25 keyword + cosine semantic) for best recall | ⬜ Pending | | Reranking | None, cross-encoder (`ms-marco-MiniLM`), LLM-as-judge | Cross-encoder rerank pass before prompt injection | ⬜ Pending | | Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ✅ Implemented | | Session index storage | Local `~/.tai/`, configurable path | `~/.tai/sessions/` with ChromaDB collection | ⬜ Pending | ______________________________________________________________________ ### Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session) Status: ✅ Implemented **Problem:** Current flow injects all collected output into the prompt as one block. On busy hosts this floods the context window with irrelevant output, degrading quality. **Approach:** - After collection, split each command's output into overlapping token chunks (e.g. 512 tokens, 64 overlap) - Embed all chunks using `nomic-embed-text` via Ollama embeddings API - On each question (initial + follow-up), embed the question and retrieve top-k chunks by cosine similarity - Inject only retrieved chunks into the prompt, not the full dump **New module:** `src/tai/rag_retriever.py` - `chunk_report(report) -> list[Chunk]` - `embed_chunks(chunks) -> list[EmbeddedChunk]` - `retrieve(question, embedded_chunks, top_k) -> list[Chunk]` **Changes to existing code:** - `prompt_builder.py`: accept `retrieved_chunks` instead of full `CollectionReport` for RAG-mode prompts - `cli.py`: embed report after collection, pass retriever to `_run_analysis` and `_run_followup_analysis` - `ai_client.py`: add `embed(text) -> list[float]` method using Ollama `/api/embeddings` **Companion features buildable at same time:** - `--no-rag` flag to bypass retrieval and use full dump (backwards compat) - Token budget display: show user how many tokens are being sent vs. saved - Per-chunk source attribution in AI response (which command produced the evidence) **Tests:** - `tests/test_rag_retriever.py`: chunk splitting, cosine similarity ranking, top-k retrieval - `tests/test_ai.py`: add `test_embed_returns_float_list()` ______________________________________________________________________ ### Tier 2 — Runbook Knowledge Base (persistent, ChromaDB) Status: ✅ Implemented **Problem:** AI improvises remediation steps from training data, which may be wrong for specific environments, distros, or internal conventions. **Approach:** - Maintain a version-controlled corpus of Markdown runbooks in `runbooks/` directory - On first run (or `tai runbooks --sync`), embed all runbooks and persist to ChromaDB collection - On each analysis, retrieve top-3 relevant runbook chunks alongside diagnostic chunks - Inject as a separate `## Runbook Context` section in the prompt **New module:** `src/tai/runbook_store.py` - `RunbookStore`: wraps ChromaDB collection - `sync(runbooks_dir) -> int` — embed and upsert all runbooks - `query(question, top_k) -> list[RunbookChunk]` **New directory:** `runbooks/` - `ssh.md`, `nginx.md`, `postgres.md`, `disk.md`, `kernel.md`, etc. - Each runbook: YAML frontmatter (`service`, `symptoms`, `tags`) + Markdown body **New CLI command:** `tai runbooks --sync [--path ./runbooks]` **Changes to existing code:** - `prompt_builder.py`: add `build_message_with_runbooks(retrieved_chunks, runbook_chunks)` - `cli.py`: optionally load `RunbookStore`, query it per analysis turn **Companion features buildable at same time:** - `tai runbooks --list` — show indexed runbooks and last sync time - `tai runbooks --add ` — index a single runbook - `/runbooks` slash command in interactive mode — show which runbooks were retrieved - Runbook citation in AI output: "Based on runbook: `ssh.md#AuthenticationFailures`" ______________________________________________________________________ ### Tier 3 — Session Memory Index (institutional learning) Status: ⬜ Pending **Problem:** Every session starts from zero. Repeat incidents on the same host or same issue type get no benefit from past work. **Approach:** - On session end, embed the session summary (issue + root cause + actions) and upsert into a persistent ChromaDB collection (`~/.tai/sessions/`) - On session start, query for similar past sessions by issue text + hostname - Inject top-2 past sessions as `## Prior Sessions` context - Optionally: `/history` command in interactive mode to surface past sessions explicitly **New module:** `src/tai/session_store.py` - `SessionStore`: wraps ChromaDB collection at `~/.tai/sessions/` - `index_session(session_log_path)` — embed and store completed session - `query_similar(issue, host, top_k) -> list[PastSession]` **Changes to existing code:** - `session_log.py`: add `summarise() -> str` method (issue + final AI response) - `cli.py`: query `SessionStore` at session start, index at session end **Companion features buildable at same time:** - `tai history` CLI subcommand — search past sessions by keyword - `tai history --host ` — all sessions for a host - `tai history --export ` — export session summaries as Markdown report - Auto-suggest: "Similar issue found from 2 weeks ago — load context? [y/N]" ______________________________________________________________________ ### Implementation Order ``` Tier 1 (diagnostic chunks) ← Start here. Zero new infra. Immediate prompt quality gain. ↓ Tier 2 (runbook KB) ← After Tier 1. Requires ChromaDB dep + runbook authoring. ↓ Tier 3 (session memory) ← Builds on Tier 2 infrastructure. Minimal extra work. ``` **Estimated effort:** - Tier 1: 2–3 days (new module + prompt builder changes + tests) - Tier 2: 3–4 days (ChromaDB + runbook authoring + CLI command + tests) - Tier 3: 1–2 days (reuses Tier 2 infrastructure) ### New Dependencies ``` # Tier 1 (zero new runtime deps — uses Ollama HTTP API already in use) # No additions needed # Tier 2 + 3 chromadb>=0.5,<1.0 # embedded vector store, no separate server # OR qdrant-client>=1.9,<2.0 # if self-hosted Qdrant preferred sentence-transformers>=3.0 # optional: cross-encoder reranking ``` ### New pyproject.toml optional group ```toml [project.optional-dependencies] rag = [ "chromadb>=0.5,<1.0", "sentence-transformers>=3.0,<4.0", ] ``` ______________________________________________________________________ ## Decisions Log | Date | Decision | Outcome | |------|----------|---------| | 2026-05-04 | Implementation language | Python — with single distributable binary via Nuitka | | — | AI inference backend | vLLM (provisional) | | — | Default model | `gemma4:a4b` (provisional) | | 2026-05-04 | SSH auth methods | Keypair only (ed25519/RSA); auto-accept new hosts; reject on key change (MITM) | | 2026-05-04 | Bastion host support | `--jump-host` flag via SSH native ProxyJump | | 2026-05-04 | SSH config behavior | Use `~/.ssh/config` by default; allow override via `--ignore-ssh-config` | | 2026-05-04 | CLI vs interactive mode | Interactive: REPL for v0.1, `textual` TUI for v0.2+ | | 2026-05-04 | RAG embedding model | `nomic-embed-text` via Ollama (local, air-gapped safe) — ⬜ pending confirmation | | 2026-05-04 | RAG vector store (Tier 1) | In-memory numpy cosine similarity — zero deps, session-scoped | | 2026-05-04 | RAG vector store (Tier 2/3) | `chromadb` embedded mode (default) or `qdrant` self-hosted — ⬜ pending confirmation | | 2026-05-04 | RAG chunking unit | Command-boundary splitting — each collected command = one or more chunks | | 2026-05-04 | Runbook format | Markdown with YAML frontmatter, version-controlled in `runbooks/` directory |