zphinx/tai

Files

CI / test (push) Failing after 15s

Details

docs(roadmap): add Phase 6 RAG & Knowledge Layer plan

- Three-tier RAG architecture: diagnostic chunks, runbook KB, session memory
- Technology decisions table with options and recommendations
- Per-tier: approach, new modules, changes to existing code, companion features
- Implementation order and effort estimates
- New dependencies and optional pyproject.toml group
- Decisions log entries for RAG choices pending confirmation

2026-05-04 18:23:33 +02:00

14 KiB

Raw Permalink Blame History

Roadmap

This document outlines the major decisions, milestones, and development phases required to bring tai from concept to a working tool.

Phase 0 — Decisions & Prerequisites

These must be resolved before meaningful development can begin.

Language Selection

Decision: Python
Key factors: native vLLM integration, mature SSH libraries (paramiko / asyncssh), strong text/log parsing, rapid development
Single binary distribution will be achieved via Nuitka (preferred for true compilation) or PyInstaller as a fallback
Evaluate Nuitka vs PyInstaller for binary output quality and CI reproducibility
Add binary build step to CI pipeline

AI Backend & Model

Confirm use of vLLM as the inference backend
Confirm gemma4:a4b as the default model (or select an alternative)
Define minimum hardware requirements for running the model locally
Decide whether the AI backend is bundled, self-hosted externally, or user-supplied

SSH Strategy

Decision: keypair authentication only — no password auth; eliminates credential storage risk
- Default key resolution: ~/.ssh/id_ed25519, ~/.ssh/id_rsa (in order of preference)
- CLI override via --identity-file <path>
- No SSH agent forwarding needed — a shared key is distributed to all managed hosts via Puppet
Known hosts: auto-accept new hosts; reject on key mismatch — a changed host key triggers a hard stop with a MITM warning; unknown/new hosts are accepted silently on first connect
Bastion/jump host: --jump-host <host> flag — delegates to SSH's native ProxyJump functionality
SSH config behavior: respect existing ~/.ssh/config by default; allow CLI override
- Default: follow host settings from ~/.ssh/config (for User, Port, ProxyJump, etc.)
- Override switch: --ignore-ssh-config to bypass local SSH config when required

Scope & Constraints

Define the supported scope of issues (services, network, disk, kernel, etc.)
Confirm read-only guarantee — document exactly what "read-only" means in practice
Decision: interactive REPL mode for v0.1, full TUI for v0.2+
- v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
- v0.2+: textual-based TUI with split panes (collected data | AI output | input bar)
- Built-in slash commands: /collect, /show logs, /clear, /host <hostname>, /help, /quit

Phase 1 — Project Foundation

Basic project scaffolding and connectivity.

Finalise repository structure and language toolchain
Set up CI pipeline (linting, tests)
Implement SSH connection module
- Define SSH config model and probe interface scaffold
- Connect to remote host
- Execute read-only commands (e.g. journalctl, systemctl status, cat)
- Stream or collect command output safely (byte-limited output with truncation marker)
Implement basic input parsing (ticket text, hostname, target directories)
Write unit tests for SSH and input modules
- Input parser and CLI tests added
- SSH module tests added for command policy and SSH argv behavior

Phase 2 — Data Collection Layer

Define what information the agent gathers and how.

Identify the canonical set of data sources per issue type:
- Service failures: journalctl, systemctl, service config files
- Network issues: ip, ss, netstat, firewall rules
- Disk issues: df, du, dmesg, smartctl
- General: /var/log/syslog, /var/log/messages, dmesg
Implement pluggable "collector" modules per data source
Implement directory traversal for user-specified paths (read-only)
Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
Write tests with mocked SSH output

Phase 3 — AI Integration

Wire collected data into the local AI model.

Implement vLLM client module
Design prompt template: system context, collected data, issue description → diagnosis
Implement response parsing and structured output (root cause + suggested steps)
Tune context window usage — handle truncation for large log outputs
Add streaming support for long AI responses
Evaluate and test model output quality on common issue types

Phase 4 — CLI & User Experience

Polish the interface for real-world use.

Design CLI interface (flags, subcommands, interactive prompts)
Implement structured output: diagnosis, confidence, recommended actions
Add --verbose / --debug mode showing raw collected data
Support output to file or clipboard
Write man page / --help documentation

Phase 5 — Hardening & Distribution

Prepare for broader use.

Security review of SSH handling and credential storage
Ensure no data is written to the remote system under any path
Package for distribution (binary release, container image, or distro packages)
Write installation and quickstart documentation
End-to-end integration tests against a test VM

Phase 6 — RAG & Knowledge Layer

Introduce Retrieval-Augmented Generation to ground AI responses in evidence rather than model weights alone. Three tiers of increasing capability, each buildable independently.

Goals

Eliminate prompt flooding on hosts with large log output
Ground recommendations in version-controlled runbooks, not model improvisation
Build compounding institutional memory from past troubleshooting sessions
Keep all data local — no embeddings or session content leaves the network

Technology Decisions Required

Decision	Options	Recommendation	Status
Embedding model	`nomic-embed-text`, `mxbai-embed-large`, `all-minilm`	`nomic-embed-text` via Ollama (local, 274MB, strong perf)	⬜ Pending
Vector store — Tier 1	In-memory numpy cosine, `faiss-cpu`	numpy (zero deps) for session scope	⬜ Pending
Vector store — Tier 2/3	`chromadb`, `qdrant`, `weaviate`, `pgvector`	`chromadb` (embedded mode, no server needed) or `qdrant` (self-hosted, REST API, production-grade)	⬜ Pending
Chunking strategy	Fixed token, sentence-aware, command-boundary	Command-boundary splitting (natural unit for diagnostics)	⬜ Pending
Hybrid retrieval	Semantic only, BM25 only, hybrid	Hybrid (BM25 keyword + cosine semantic) for best recall	⬜ Pending
Reranking	None, cross-encoder (`ms-marco-MiniLM`), LLM-as-judge	Cross-encoder rerank pass before prompt injection	⬜ Pending
Runbook format	Markdown, YAML, JSON	Markdown (human-editable, version-controllable)	⬜ Pending
Session index storage	Local `~/.tai/`, configurable path	`~/.tai/sessions/` with ChromaDB collection	⬜ Pending

Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session)

Problem: Current flow injects all collected output into the prompt as one block. On busy hosts this floods the context window with irrelevant output, degrading quality.

Approach:

After collection, split each command's output into overlapping token chunks (e.g. 512 tokens, 64 overlap)
Embed all chunks using nomic-embed-text via Ollama embeddings API
On each question (initial + follow-up), embed the question and retrieve top-k chunks by cosine similarity
Inject only retrieved chunks into the prompt, not the full dump

New module: src/tai/rag_retriever.py

chunk_report(report) -> list[Chunk]
embed_chunks(chunks) -> list[EmbeddedChunk]
retrieve(question, embedded_chunks, top_k) -> list[Chunk]

Changes to existing code:

prompt_builder.py: accept retrieved_chunks instead of full CollectionReport for RAG-mode prompts
cli.py: embed report after collection, pass retriever to _run_analysis and _run_followup_analysis
ai_client.py: add embed(text) -> list[float] method using Ollama /api/embeddings

Companion features buildable at same time:

--no-rag flag to bypass retrieval and use full dump (backwards compat)
Token budget display: show user how many tokens are being sent vs. saved
Per-chunk source attribution in AI response (which command produced the evidence)

Tests:

tests/test_rag_retriever.py: chunk splitting, cosine similarity ranking, top-k retrieval
tests/test_ai.py: add test_embed_returns_float_list()

Tier 2 — Runbook Knowledge Base (persistent, ChromaDB)

Problem: AI improvises remediation steps from training data, which may be wrong for specific environments, distros, or internal conventions.

Approach:

Maintain a version-controlled corpus of Markdown runbooks in runbooks/ directory
On first run (or tai runbooks --sync), embed all runbooks and persist to ChromaDB collection
On each analysis, retrieve top-3 relevant runbook chunks alongside diagnostic chunks
Inject as a separate ## Runbook Context section in the prompt

New module: src/tai/runbook_store.py

RunbookStore: wraps ChromaDB collection
sync(runbooks_dir) -> int — embed and upsert all runbooks
query(question, top_k) -> list[RunbookChunk]

New directory: runbooks/

ssh.md, nginx.md, postgres.md, disk.md, kernel.md, etc.
Each runbook: YAML frontmatter (service, symptoms, tags) + Markdown body

New CLI command: tai runbooks --sync [--path ./runbooks]

Changes to existing code:

prompt_builder.py: add build_message_with_runbooks(retrieved_chunks, runbook_chunks)
cli.py: optionally load RunbookStore, query it per analysis turn

Companion features buildable at same time:

tai runbooks --list — show indexed runbooks and last sync time
tai runbooks --add <file> — index a single runbook
/runbooks slash command in interactive mode — show which runbooks were retrieved
Runbook citation in AI output: "Based on runbook: ssh.md#AuthenticationFailures"

Tier 3 — Session Memory Index (institutional learning)

Problem: Every session starts from zero. Repeat incidents on the same host or same issue type get no benefit from past work.

Approach:

On session end, embed the session summary (issue + root cause + actions) and upsert into a persistent ChromaDB collection (~/.tai/sessions/)
On session start, query for similar past sessions by issue text + hostname
Inject top-2 past sessions as ## Prior Sessions context
Optionally: /history command in interactive mode to surface past sessions explicitly

New module: src/tai/session_store.py

SessionStore: wraps ChromaDB collection at ~/.tai/sessions/
index_session(session_log_path) — embed and store completed session
query_similar(issue, host, top_k) -> list[PastSession]

Changes to existing code:

session_log.py: add summarise() -> str method (issue + final AI response)
cli.py: query SessionStore at session start, index at session end

Companion features buildable at same time:

tai history CLI subcommand — search past sessions by keyword
tai history --host <hostname> — all sessions for a host
tai history --export <file> — export session summaries as Markdown report
Auto-suggest: "Similar issue found from 2 weeks ago — load context? [y/N]"

Implementation Order

Tier 1 (diagnostic chunks)     ← Start here. Zero new infra. Immediate prompt quality gain.
       ↓
Tier 2 (runbook KB)            ← After Tier 1. Requires ChromaDB dep + runbook authoring.
       ↓
Tier 3 (session memory)        ← Builds on Tier 2 infrastructure. Minimal extra work.

Estimated effort:

Tier 1: 2–3 days (new module + prompt builder changes + tests)
Tier 2: 3–4 days (ChromaDB + runbook authoring + CLI command + tests)
Tier 3: 1–2 days (reuses Tier 2 infrastructure)

New Dependencies

# Tier 1 (zero new runtime deps — uses Ollama HTTP API already in use)
# No additions needed

# Tier 2 + 3
chromadb>=0.5,<1.0          # embedded vector store, no separate server
# OR
qdrant-client>=1.9,<2.0     # if self-hosted Qdrant preferred

sentence-transformers>=3.0  # optional: cross-encoder reranking

New pyproject.toml optional group

[project.optional-dependencies]
rag = [
  "chromadb>=0.5,<1.0",
  "sentence-transformers>=3.0,<4.0",
]

Decisions Log

Date	Decision	Outcome
2026-05-04	Implementation language	Python — with single distributable binary via Nuitka
—	AI inference backend	vLLM (provisional)
—	Default model	`gemma4:a4b` (provisional)
2026-05-04	SSH auth methods	Keypair only (ed25519/RSA); auto-accept new hosts; reject on key change (MITM)
2026-05-04	Bastion host support	`--jump-host` flag via SSH native ProxyJump
2026-05-04	SSH config behavior	Use `~/.ssh/config` by default; allow override via `--ignore-ssh-config`
2026-05-04	CLI vs interactive mode	Interactive: REPL for v0.1, `textual` TUI for v0.2+
2026-05-04	RAG embedding model	`nomic-embed-text` via Ollama (local, air-gapped safe) — ⬜ pending confirmation
2026-05-04	RAG vector store (Tier 1)	In-memory numpy cosine similarity — zero deps, session-scoped
2026-05-04	RAG vector store (Tier 2/3)	`chromadb` embedded mode (default) or `qdrant` self-hosted — ⬜ pending confirmation
2026-05-04	RAG chunking unit	Command-boundary splitting — each collected command = one or more chunks
2026-05-04	Runbook format	Markdown with YAML frontmatter, version-controlled in `runbooks/` directory

14 KiB Raw Permalink Blame History Unescape Escape

Roadmap

Phase 0 — Decisions & Prerequisites

Language Selection

AI Backend & Model

SSH Strategy

Scope & Constraints

Phase 1 — Project Foundation

Phase 2 — Data Collection Layer

Phase 3 — AI Integration

Phase 4 — CLI & User Experience

Phase 5 — Hardening & Distribution

Phase 6 — RAG & Knowledge Layer

Goals

Technology Decisions Required

Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session)

Tier 2 — Runbook Knowledge Base (persistent, ChromaDB)

Tier 3 — Session Memory Index (institutional learning)

Implementation Order

New Dependencies

New pyproject.toml optional group

Decisions Log

14 KiB

Raw Permalink Blame History