14 KiB
Roadmap
This document outlines the major decisions, milestones, and development phases required to bring tai from concept to a working tool.
Phase 0 — Decisions & Prerequisites
These must be resolved before meaningful development can begin.
Language Selection
- Decision: Python
- Key factors: native vLLM integration, mature SSH libraries (
paramiko/asyncssh), strong text/log parsing, rapid development - Single binary distribution will be achieved via Nuitka (preferred for true compilation) or PyInstaller as a fallback
- Evaluate Nuitka vs PyInstaller for binary output quality and CI reproducibility
- Add binary build step to CI pipeline
AI Backend & Model
- OpenAI-compatible backend client implemented (
AIClient) - Default local backend profile wired for Ollama (
http://localhost:11434/v1) - Default model profile set to
gemma3:4b(override via--model) - Define minimum hardware requirements for running the model locally
- AI backend is user-supplied/self-hosted
SSH Strategy
- Decision: keypair authentication only — no password auth; eliminates credential storage risk
- Default key resolution:
~/.ssh/id_ed25519,~/.ssh/id_rsa(in order of preference) - CLI override via
--identity-file <path> - No SSH agent forwarding needed — a shared key is distributed to all managed hosts via Puppet
- Default key resolution:
- Known hosts: auto-accept new hosts; reject on key mismatch — a changed host key triggers a hard stop with a MITM warning; unknown/new hosts are accepted silently on first connect
- Bastion/jump host:
--jump-host <host>flag — delegates to SSH's native ProxyJump functionality - SSH config behavior: respect existing
~/.ssh/configby default; allow CLI override- Default: follow host settings from
~/.ssh/config(forUser,Port,ProxyJump, etc.) - Override switch:
--ignore-ssh-configto bypass local SSH config when required
- Default: follow host settings from
Scope & Constraints
- Define the supported scope of issues (services, network, disk, kernel, etc.)
- Read-only guarantee implemented with command allowlist + blocked shell operator policy
- Decision: interactive REPL mode for v0.1, full TUI for v0.2+
- v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
- v0.2+:
textual-based TUI with split panes (collected data | AI output | input bar) - Built-in slash commands:
/collect,/show logs,/clear,/host <hostname>,/help,/quit
Phase 1 — Project Foundation
Basic project scaffolding and connectivity.
- Finalise repository structure and language toolchain
- Set up CI pipeline (linting, tests)
- Implement SSH connection module
- Define SSH config model and probe interface scaffold
- Connect to remote host
- Execute read-only commands (e.g.
journalctl,systemctl status,cat) - Stream or collect command output safely (byte-limited output with truncation marker)
- Implement basic input parsing (ticket text, hostname, target directories)
- Write unit tests for SSH and input modules
- Input parser and CLI tests added
- SSH module tests added for command policy and SSH argv behavior
Phase 2 — Data Collection Layer
Define what information the agent gathers and how.
- Identify a baseline canonical set of data sources per issue type:
- Service failures:
journalctl,systemctl, service config files - Network issues:
ip,ss,netstat, firewall rules - Disk issues:
df,du,dmesg,smartctl - General:
/var/log/syslog,/var/log/messages,dmesg
- Service failures:
- Implement collectors and plan builder for baseline issue categories
- Implement directory traversal for user-specified paths (read-only)
- Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
- Write tests with mocked SSH output
Phase 3 — AI Integration
Wire collected data into the local AI model.
- Implement OpenAI-compatible AI client module
- Design prompt templates for initial and follow-up analysis
- Implement response guardrail checks and structured response headings
- Tune context usage with RAG retrieval and chunk/runbook truncation budgets
- Implement reliable non-streaming completion path for local backends
- Continue output quality tuning and grounding evaluation on real hosts
Phase 4 — CLI & User Experience
Polish the interface for real-world use.
- Design CLI interface with run command, interactive prompts, and runbook subcommands
- Implement structured output sections (Root Cause, Evidence, Recommended Actions)
- Add RAG debug mode (
--rag-debug) showing retrieval scores - Support output to file or clipboard
- Provide comprehensive
--helpcommand documentation via Typer options
Phase 5 — Hardening & Distribution
Prepare for broader use.
- Security review of SSH handling and credential storage
- Ensure no data is written to the remote system under any path
- Package for distribution (binary release, container image, or distro packages)
- Write installation and quickstart documentation
- End-to-end integration tests against a test VM
Phase 6 — RAG & Knowledge Layer
Introduce Retrieval-Augmented Generation to ground AI responses in evidence rather than model weights alone. Three tiers of increasing capability, each buildable independently.
Goals
- Eliminate prompt flooding on hosts with large log output
- Ground recommendations in version-controlled runbooks, not model improvisation
- Build compounding institutional memory from past troubleshooting sessions
- Keep all data local — no embeddings or session content leaves the network
Technology Decisions Required
| Decision | Options | Recommendation | Status |
|---|---|---|---|
| Embedding model | nomic-embed-text, mxbai-embed-large, all-minilm |
nomic-embed-text via Ollama (local, 274MB, strong perf) |
✅ Implemented |
| Vector store — Tier 1 | In-memory numpy cosine, faiss-cpu |
numpy (zero deps) for session scope | ✅ Implemented |
| Vector store — Tier 2/3 | chromadb, qdrant, weaviate, pgvector |
chromadb embedded mode |
✅ Tier 2 Implemented |
| Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ✅ Implemented |
| Hybrid retrieval | Semantic only, BM25 only, hybrid | Hybrid (BM25 keyword + cosine semantic) for best recall | ⬜ Pending |
| Reranking | None, cross-encoder (ms-marco-MiniLM), LLM-as-judge |
Cross-encoder rerank pass before prompt injection | ⬜ Pending |
| Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ✅ Implemented |
| Session index storage | Local ~/.tai/, configurable path |
~/.tai/sessions/ with ChromaDB collection |
⬜ Pending |
Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session)
Status: ✅ Implemented
Problem: Current flow injects all collected output into the prompt as one block. On busy hosts this floods the context window with irrelevant output, degrading quality.
Approach:
- After collection, split each command's output into overlapping token chunks (e.g. 512 tokens, 64 overlap)
- Embed all chunks using
nomic-embed-textvia Ollama embeddings API - On each question (initial + follow-up), embed the question and retrieve top-k chunks by cosine similarity
- Inject only retrieved chunks into the prompt, not the full dump
New module: src/tai/rag_retriever.py
chunk_report(report) -> list[Chunk]embed_chunks(chunks) -> list[EmbeddedChunk]retrieve(question, embedded_chunks, top_k) -> list[Chunk]
Changes to existing code:
prompt_builder.py: acceptretrieved_chunksinstead of fullCollectionReportfor RAG-mode promptscli.py: embed report after collection, pass retriever to_run_analysisand_run_followup_analysisai_client.py: addembed(text) -> list[float]method using Ollama/api/embeddings
Companion features buildable at same time:
--no-ragflag to bypass retrieval and use full dump (backwards compat)- Token budget display: show user how many tokens are being sent vs. saved
- Per-chunk source attribution in AI response (which command produced the evidence)
Tests:
tests/test_rag_retriever.py: chunk splitting, cosine similarity ranking, top-k retrievaltests/test_ai.py: addtest_embed_returns_float_list()
Tier 2 — Runbook Knowledge Base (persistent, ChromaDB)
Status: ✅ Implemented
Problem: AI improvises remediation steps from training data, which may be wrong for specific environments, distros, or internal conventions.
Approach:
- Maintain a version-controlled corpus of Markdown runbooks in
runbooks/directory - On first run (or
tai runbooks --sync), embed all runbooks and persist to ChromaDB collection - On each analysis, retrieve top-3 relevant runbook chunks alongside diagnostic chunks
- Inject as a separate
## Runbook Contextsection in the prompt
New module: src/tai/runbook_store.py
RunbookStore: wraps ChromaDB collectionsync(runbooks_dir) -> int— embed and upsert all runbooksquery(question, top_k) -> list[RunbookChunk]
New directory: runbooks/
ssh.md,nginx.md,postgres.md,disk.md,kernel.md, etc.- Each runbook: YAML frontmatter (
service,symptoms,tags) + Markdown body
New CLI command: tai runbooks --sync [--path ./runbooks]
Changes to existing code:
prompt_builder.py: addbuild_message_with_runbooks(retrieved_chunks, runbook_chunks)cli.py: optionally loadRunbookStore, query it per analysis turn
Companion features buildable at same time:
tai runbooks --list— show indexed runbooks and last sync timetai runbooks --add <file>— index a single runbook/runbooksslash command in interactive mode — show which runbooks were retrieved- Runbook citation in AI output: "Based on runbook:
ssh.md#AuthenticationFailures"
Tier 3 — Session Memory Index (institutional learning)
Status: ⬜ Pending
Problem: Every session starts from zero. Repeat incidents on the same host or same issue type get no benefit from past work.
Approach:
- On session end, embed the session summary (issue + root cause + actions) and upsert into a persistent ChromaDB collection (
~/.tai/sessions/) - On session start, query for similar past sessions by issue text + hostname
- Inject top-2 past sessions as
## Prior Sessionscontext - Optionally:
/historycommand in interactive mode to surface past sessions explicitly
New module: src/tai/session_store.py
SessionStore: wraps ChromaDB collection at~/.tai/sessions/index_session(session_log_path)— embed and store completed sessionquery_similar(issue, host, top_k) -> list[PastSession]
Changes to existing code:
session_log.py: addsummarise() -> strmethod (issue + final AI response)cli.py: querySessionStoreat session start, index at session end
Companion features buildable at same time:
tai historyCLI subcommand — search past sessions by keywordtai history --host <hostname>— all sessions for a hosttai history --export <file>— export session summaries as Markdown report- Auto-suggest: "Similar issue found from 2 weeks ago — load context? [y/N]"
Implementation Order
Tier 1 (diagnostic chunks) ← Start here. Zero new infra. Immediate prompt quality gain.
↓
Tier 2 (runbook KB) ← After Tier 1. Requires ChromaDB dep + runbook authoring.
↓
Tier 3 (session memory) ← Builds on Tier 2 infrastructure. Minimal extra work.
Estimated effort:
- Tier 1: 2–3 days (new module + prompt builder changes + tests)
- Tier 2: 3–4 days (ChromaDB + runbook authoring + CLI command + tests)
- Tier 3: 1–2 days (reuses Tier 2 infrastructure)
New Dependencies
# Tier 1 (zero new runtime deps — uses Ollama HTTP API already in use)
# No additions needed
# Tier 2 + 3
chromadb>=0.5,<1.0 # embedded vector store, no separate server
# OR
qdrant-client>=1.9,<2.0 # if self-hosted Qdrant preferred
sentence-transformers>=3.0 # optional: cross-encoder reranking
New pyproject.toml optional group
[project.optional-dependencies]
rag = [
"chromadb>=0.5,<1.0",
"sentence-transformers>=3.0,<4.0",
]
Decisions Log
| Date | Decision | Outcome |
|---|---|---|
| 2026-05-04 | Implementation language | Python — with single distributable binary via Nuitka |
| — | AI inference backend | vLLM (provisional) |
| — | Default model | gemma4:a4b (provisional) |
| 2026-05-04 | SSH auth methods | Keypair only (ed25519/RSA); auto-accept new hosts; reject on key change (MITM) |
| 2026-05-04 | Bastion host support | --jump-host flag via SSH native ProxyJump |
| 2026-05-04 | SSH config behavior | Use ~/.ssh/config by default; allow override via --ignore-ssh-config |
| 2026-05-04 | CLI vs interactive mode | Interactive: REPL for v0.1, textual TUI for v0.2+ |
| 2026-05-04 | RAG embedding model | nomic-embed-text via Ollama (local, air-gapped safe) — ⬜ pending confirmation |
| 2026-05-04 | RAG vector store (Tier 1) | In-memory numpy cosine similarity — zero deps, session-scoped |
| 2026-05-04 | RAG vector store (Tier 2/3) | chromadb embedded mode (default) or qdrant self-hosted — ⬜ pending confirmation |
| 2026-05-04 | RAG chunking unit | Command-boundary splitting — each collected command = one or more chunks |
| 2026-05-04 | Runbook format | Markdown with YAML frontmatter, version-controlled in runbooks/ directory |