Files
tai/ROADMAP.md
zphinx e49670a664
Some checks failed
CI / test (push) Failing after 15s
docs(roadmap): add Phase 6 RAG & Knowledge Layer plan
- Three-tier RAG architecture: diagnostic chunks, runbook KB, session memory
- Technology decisions table with options and recommendations
- Per-tier: approach, new modules, changes to existing code, companion features
- Implementation order and effort estimates
- New dependencies and optional pyproject.toml group
- Decisions log entries for RAG choices pending confirmation
2026-05-04 18:23:33 +02:00

14 KiB
Raw Permalink Blame History

Roadmap

This document outlines the major decisions, milestones, and development phases required to bring tai from concept to a working tool.


Phase 0 — Decisions & Prerequisites

These must be resolved before meaningful development can begin.

Language Selection

  • Decision: Python
  • Key factors: native vLLM integration, mature SSH libraries (paramiko / asyncssh), strong text/log parsing, rapid development
  • Single binary distribution will be achieved via Nuitka (preferred for true compilation) or PyInstaller as a fallback
  • Evaluate Nuitka vs PyInstaller for binary output quality and CI reproducibility
  • Add binary build step to CI pipeline

AI Backend & Model

  • Confirm use of vLLM as the inference backend
  • Confirm gemma4:a4b as the default model (or select an alternative)
  • Define minimum hardware requirements for running the model locally
  • Decide whether the AI backend is bundled, self-hosted externally, or user-supplied

SSH Strategy

  • Decision: keypair authentication only — no password auth; eliminates credential storage risk
    • Default key resolution: ~/.ssh/id_ed25519, ~/.ssh/id_rsa (in order of preference)
    • CLI override via --identity-file <path>
    • No SSH agent forwarding needed — a shared key is distributed to all managed hosts via Puppet
  • Known hosts: auto-accept new hosts; reject on key mismatch — a changed host key triggers a hard stop with a MITM warning; unknown/new hosts are accepted silently on first connect
  • Bastion/jump host: --jump-host <host> flag — delegates to SSH's native ProxyJump functionality
  • SSH config behavior: respect existing ~/.ssh/config by default; allow CLI override
    • Default: follow host settings from ~/.ssh/config (for User, Port, ProxyJump, etc.)
    • Override switch: --ignore-ssh-config to bypass local SSH config when required

Scope & Constraints

  • Define the supported scope of issues (services, network, disk, kernel, etc.)
  • Confirm read-only guarantee — document exactly what "read-only" means in practice
  • Decision: interactive REPL mode for v0.1, full TUI for v0.2+
    • v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
    • v0.2+: textual-based TUI with split panes (collected data | AI output | input bar)
    • Built-in slash commands: /collect, /show logs, /clear, /host <hostname>, /help, /quit

Phase 1 — Project Foundation

Basic project scaffolding and connectivity.

  • Finalise repository structure and language toolchain
  • Set up CI pipeline (linting, tests)
  • Implement SSH connection module
    • Define SSH config model and probe interface scaffold
    • Connect to remote host
    • Execute read-only commands (e.g. journalctl, systemctl status, cat)
    • Stream or collect command output safely (byte-limited output with truncation marker)
  • Implement basic input parsing (ticket text, hostname, target directories)
  • Write unit tests for SSH and input modules
    • Input parser and CLI tests added
    • SSH module tests added for command policy and SSH argv behavior

Phase 2 — Data Collection Layer

Define what information the agent gathers and how.

  • Identify the canonical set of data sources per issue type:
    • Service failures: journalctl, systemctl, service config files
    • Network issues: ip, ss, netstat, firewall rules
    • Disk issues: df, du, dmesg, smartctl
    • General: /var/log/syslog, /var/log/messages, dmesg
  • Implement pluggable "collector" modules per data source
  • Implement directory traversal for user-specified paths (read-only)
  • Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
  • Write tests with mocked SSH output

Phase 3 — AI Integration

Wire collected data into the local AI model.

  • Implement vLLM client module
  • Design prompt template: system context, collected data, issue description → diagnosis
  • Implement response parsing and structured output (root cause + suggested steps)
  • Tune context window usage — handle truncation for large log outputs
  • Add streaming support for long AI responses
  • Evaluate and test model output quality on common issue types

Phase 4 — CLI & User Experience

Polish the interface for real-world use.

  • Design CLI interface (flags, subcommands, interactive prompts)
  • Implement structured output: diagnosis, confidence, recommended actions
  • Add --verbose / --debug mode showing raw collected data
  • Support output to file or clipboard
  • Write man page / --help documentation

Phase 5 — Hardening & Distribution

Prepare for broader use.

  • Security review of SSH handling and credential storage
  • Ensure no data is written to the remote system under any path
  • Package for distribution (binary release, container image, or distro packages)
  • Write installation and quickstart documentation
  • End-to-end integration tests against a test VM

Phase 6 — RAG & Knowledge Layer

Introduce Retrieval-Augmented Generation to ground AI responses in evidence rather than model weights alone. Three tiers of increasing capability, each buildable independently.

Goals

  • Eliminate prompt flooding on hosts with large log output
  • Ground recommendations in version-controlled runbooks, not model improvisation
  • Build compounding institutional memory from past troubleshooting sessions
  • Keep all data local — no embeddings or session content leaves the network

Technology Decisions Required

Decision Options Recommendation Status
Embedding model nomic-embed-text, mxbai-embed-large, all-minilm nomic-embed-text via Ollama (local, 274MB, strong perf) Pending
Vector store — Tier 1 In-memory numpy cosine, faiss-cpu numpy (zero deps) for session scope Pending
Vector store — Tier 2/3 chromadb, qdrant, weaviate, pgvector chromadb (embedded mode, no server needed) or qdrant (self-hosted, REST API, production-grade) Pending
Chunking strategy Fixed token, sentence-aware, command-boundary Command-boundary splitting (natural unit for diagnostics) Pending
Hybrid retrieval Semantic only, BM25 only, hybrid Hybrid (BM25 keyword + cosine semantic) for best recall Pending
Reranking None, cross-encoder (ms-marco-MiniLM), LLM-as-judge Cross-encoder rerank pass before prompt injection Pending
Runbook format Markdown, YAML, JSON Markdown (human-editable, version-controllable) Pending
Session index storage Local ~/.tai/, configurable path ~/.tai/sessions/ with ChromaDB collection Pending

Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session)

Problem: Current flow injects all collected output into the prompt as one block. On busy hosts this floods the context window with irrelevant output, degrading quality.

Approach:

  • After collection, split each command's output into overlapping token chunks (e.g. 512 tokens, 64 overlap)
  • Embed all chunks using nomic-embed-text via Ollama embeddings API
  • On each question (initial + follow-up), embed the question and retrieve top-k chunks by cosine similarity
  • Inject only retrieved chunks into the prompt, not the full dump

New module: src/tai/rag_retriever.py

  • chunk_report(report) -> list[Chunk]
  • embed_chunks(chunks) -> list[EmbeddedChunk]
  • retrieve(question, embedded_chunks, top_k) -> list[Chunk]

Changes to existing code:

  • prompt_builder.py: accept retrieved_chunks instead of full CollectionReport for RAG-mode prompts
  • cli.py: embed report after collection, pass retriever to _run_analysis and _run_followup_analysis
  • ai_client.py: add embed(text) -> list[float] method using Ollama /api/embeddings

Companion features buildable at same time:

  • --no-rag flag to bypass retrieval and use full dump (backwards compat)
  • Token budget display: show user how many tokens are being sent vs. saved
  • Per-chunk source attribution in AI response (which command produced the evidence)

Tests:

  • tests/test_rag_retriever.py: chunk splitting, cosine similarity ranking, top-k retrieval
  • tests/test_ai.py: add test_embed_returns_float_list()

Tier 2 — Runbook Knowledge Base (persistent, ChromaDB)

Problem: AI improvises remediation steps from training data, which may be wrong for specific environments, distros, or internal conventions.

Approach:

  • Maintain a version-controlled corpus of Markdown runbooks in runbooks/ directory
  • On first run (or tai runbooks --sync), embed all runbooks and persist to ChromaDB collection
  • On each analysis, retrieve top-3 relevant runbook chunks alongside diagnostic chunks
  • Inject as a separate ## Runbook Context section in the prompt

New module: src/tai/runbook_store.py

  • RunbookStore: wraps ChromaDB collection
  • sync(runbooks_dir) -> int — embed and upsert all runbooks
  • query(question, top_k) -> list[RunbookChunk]

New directory: runbooks/

  • ssh.md, nginx.md, postgres.md, disk.md, kernel.md, etc.
  • Each runbook: YAML frontmatter (service, symptoms, tags) + Markdown body

New CLI command: tai runbooks --sync [--path ./runbooks]

Changes to existing code:

  • prompt_builder.py: add build_message_with_runbooks(retrieved_chunks, runbook_chunks)
  • cli.py: optionally load RunbookStore, query it per analysis turn

Companion features buildable at same time:

  • tai runbooks --list — show indexed runbooks and last sync time
  • tai runbooks --add <file> — index a single runbook
  • /runbooks slash command in interactive mode — show which runbooks were retrieved
  • Runbook citation in AI output: "Based on runbook: ssh.md#AuthenticationFailures"

Tier 3 — Session Memory Index (institutional learning)

Problem: Every session starts from zero. Repeat incidents on the same host or same issue type get no benefit from past work.

Approach:

  • On session end, embed the session summary (issue + root cause + actions) and upsert into a persistent ChromaDB collection (~/.tai/sessions/)
  • On session start, query for similar past sessions by issue text + hostname
  • Inject top-2 past sessions as ## Prior Sessions context
  • Optionally: /history command in interactive mode to surface past sessions explicitly

New module: src/tai/session_store.py

  • SessionStore: wraps ChromaDB collection at ~/.tai/sessions/
  • index_session(session_log_path) — embed and store completed session
  • query_similar(issue, host, top_k) -> list[PastSession]

Changes to existing code:

  • session_log.py: add summarise() -> str method (issue + final AI response)
  • cli.py: query SessionStore at session start, index at session end

Companion features buildable at same time:

  • tai history CLI subcommand — search past sessions by keyword
  • tai history --host <hostname> — all sessions for a host
  • tai history --export <file> — export session summaries as Markdown report
  • Auto-suggest: "Similar issue found from 2 weeks ago — load context? [y/N]"

Implementation Order

Tier 1 (diagnostic chunks)     ← Start here. Zero new infra. Immediate prompt quality gain.
       ↓
Tier 2 (runbook KB)            ← After Tier 1. Requires ChromaDB dep + runbook authoring.
       ↓
Tier 3 (session memory)        ← Builds on Tier 2 infrastructure. Minimal extra work.

Estimated effort:

  • Tier 1: 23 days (new module + prompt builder changes + tests)
  • Tier 2: 34 days (ChromaDB + runbook authoring + CLI command + tests)
  • Tier 3: 12 days (reuses Tier 2 infrastructure)

New Dependencies

# Tier 1 (zero new runtime deps — uses Ollama HTTP API already in use)
# No additions needed

# Tier 2 + 3
chromadb>=0.5,<1.0          # embedded vector store, no separate server
# OR
qdrant-client>=1.9,<2.0     # if self-hosted Qdrant preferred

sentence-transformers>=3.0  # optional: cross-encoder reranking

New pyproject.toml optional group

[project.optional-dependencies]
rag = [
  "chromadb>=0.5,<1.0",
  "sentence-transformers>=3.0,<4.0",
]

Decisions Log

Date Decision Outcome
2026-05-04 Implementation language Python — with single distributable binary via Nuitka
AI inference backend vLLM (provisional)
Default model gemma4:a4b (provisional)
2026-05-04 SSH auth methods Keypair only (ed25519/RSA); auto-accept new hosts; reject on key change (MITM)
2026-05-04 Bastion host support --jump-host flag via SSH native ProxyJump
2026-05-04 SSH config behavior Use ~/.ssh/config by default; allow override via --ignore-ssh-config
2026-05-04 CLI vs interactive mode Interactive: REPL for v0.1, textual TUI for v0.2+
2026-05-04 RAG embedding model nomic-embed-text via Ollama (local, air-gapped safe) — pending confirmation
2026-05-04 RAG vector store (Tier 1) In-memory numpy cosine similarity — zero deps, session-scoped
2026-05-04 RAG vector store (Tier 2/3) chromadb embedded mode (default) or qdrant self-hosted — pending confirmation
2026-05-04 RAG chunking unit Command-boundary splitting — each collected command = one or more chunks
2026-05-04 Runbook format Markdown with YAML frontmatter, version-controlled in runbooks/ directory