Compare commits
4 Commits
feature/re
...
feature/ra
| Author | SHA1 | Date | |
|---|---|---|---|
| e49670a664 | |||
| 4870bd3bfe | |||
| 5798d87993 | |||
| 2c738579bd |
@@ -1,9 +1,9 @@
|
||||
name: Release
|
||||
name: Tag Build
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- "v*"
|
||||
- "[0-9]*"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -61,8 +61,8 @@ jobs:
|
||||
run: |
|
||||
if command -v apt-get >/dev/null 2>&1; then
|
||||
apt-get update
|
||||
apt-get install -y python3.12 python3.12-venv python3-pip patchelf ccache || \
|
||||
apt-get install -y python3 python3-pip python3-venv patchelf ccache
|
||||
apt-get install -y python3.12 python3.12-venv python3-pip patchelf ccache zip || \
|
||||
apt-get install -y python3 python3-pip python3-venv patchelf ccache zip
|
||||
elif command -v dnf >/dev/null 2>&1; then
|
||||
dnf install -y python3 python3-pip python3-devel patchelf ccache
|
||||
elif command -v yum >/dev/null 2>&1; then
|
||||
169
ROADMAP.md
169
ROADMAP.md
@@ -117,6 +117,170 @@ Prepare for broader use.
|
||||
|
||||
______________________________________________________________________
|
||||
|
||||
## Phase 6 — RAG & Knowledge Layer
|
||||
|
||||
Introduce Retrieval-Augmented Generation to ground AI responses in evidence rather than
|
||||
model weights alone. Three tiers of increasing capability, each buildable independently.
|
||||
|
||||
### Goals
|
||||
|
||||
- Eliminate prompt flooding on hosts with large log output
|
||||
- Ground recommendations in version-controlled runbooks, not model improvisation
|
||||
- Build compounding institutional memory from past troubleshooting sessions
|
||||
- Keep all data local — no embeddings or session content leaves the network
|
||||
|
||||
---
|
||||
|
||||
### Technology Decisions Required
|
||||
|
||||
| Decision | Options | Recommendation | Status |
|
||||
|---|---|---|---|
|
||||
| Embedding model | `nomic-embed-text`, `mxbai-embed-large`, `all-minilm` | `nomic-embed-text` via Ollama (local, 274MB, strong perf) | ⬜ Pending |
|
||||
| Vector store — Tier 1 | In-memory numpy cosine, `faiss-cpu` | numpy (zero deps) for session scope | ⬜ Pending |
|
||||
| Vector store — Tier 2/3 | `chromadb`, `qdrant`, `weaviate`, `pgvector` | `chromadb` (embedded mode, no server needed) or `qdrant` (self-hosted, REST API, production-grade) | ⬜ Pending |
|
||||
| Chunking strategy | Fixed token, sentence-aware, command-boundary | Command-boundary splitting (natural unit for diagnostics) | ⬜ Pending |
|
||||
| Hybrid retrieval | Semantic only, BM25 only, hybrid | Hybrid (BM25 keyword + cosine semantic) for best recall | ⬜ Pending |
|
||||
| Reranking | None, cross-encoder (`ms-marco-MiniLM`), LLM-as-judge | Cross-encoder rerank pass before prompt injection | ⬜ Pending |
|
||||
| Runbook format | Markdown, YAML, JSON | Markdown (human-editable, version-controllable) | ⬜ Pending |
|
||||
| Session index storage | Local `~/.tai/`, configurable path | `~/.tai/sessions/` with ChromaDB collection | ⬜ Pending |
|
||||
|
||||
---
|
||||
|
||||
### Tier 1 — Diagnostic Chunk Retrieval (in-memory, per-session)
|
||||
|
||||
**Problem:** Current flow injects all collected output into the prompt as one block.
|
||||
On busy hosts this floods the context window with irrelevant output, degrading quality.
|
||||
|
||||
**Approach:**
|
||||
- After collection, split each command's output into overlapping token chunks (e.g. 512 tokens, 64 overlap)
|
||||
- Embed all chunks using `nomic-embed-text` via Ollama embeddings API
|
||||
- On each question (initial + follow-up), embed the question and retrieve top-k chunks by cosine similarity
|
||||
- Inject only retrieved chunks into the prompt, not the full dump
|
||||
|
||||
**New module:** `src/tai/rag_retriever.py`
|
||||
- `chunk_report(report) -> list[Chunk]`
|
||||
- `embed_chunks(chunks) -> list[EmbeddedChunk]`
|
||||
- `retrieve(question, embedded_chunks, top_k) -> list[Chunk]`
|
||||
|
||||
**Changes to existing code:**
|
||||
- `prompt_builder.py`: accept `retrieved_chunks` instead of full `CollectionReport` for RAG-mode prompts
|
||||
- `cli.py`: embed report after collection, pass retriever to `_run_analysis` and `_run_followup_analysis`
|
||||
- `ai_client.py`: add `embed(text) -> list[float]` method using Ollama `/api/embeddings`
|
||||
|
||||
**Companion features buildable at same time:**
|
||||
- `--no-rag` flag to bypass retrieval and use full dump (backwards compat)
|
||||
- Token budget display: show user how many tokens are being sent vs. saved
|
||||
- Per-chunk source attribution in AI response (which command produced the evidence)
|
||||
|
||||
**Tests:**
|
||||
- `tests/test_rag_retriever.py`: chunk splitting, cosine similarity ranking, top-k retrieval
|
||||
- `tests/test_ai.py`: add `test_embed_returns_float_list()`
|
||||
|
||||
---
|
||||
|
||||
### Tier 2 — Runbook Knowledge Base (persistent, ChromaDB)
|
||||
|
||||
**Problem:** AI improvises remediation steps from training data, which may be wrong for
|
||||
specific environments, distros, or internal conventions.
|
||||
|
||||
**Approach:**
|
||||
- Maintain a version-controlled corpus of Markdown runbooks in `runbooks/` directory
|
||||
- On first run (or `tai runbooks --sync`), embed all runbooks and persist to ChromaDB collection
|
||||
- On each analysis, retrieve top-3 relevant runbook chunks alongside diagnostic chunks
|
||||
- Inject as a separate `## Runbook Context` section in the prompt
|
||||
|
||||
**New module:** `src/tai/runbook_store.py`
|
||||
- `RunbookStore`: wraps ChromaDB collection
|
||||
- `sync(runbooks_dir) -> int` — embed and upsert all runbooks
|
||||
- `query(question, top_k) -> list[RunbookChunk]`
|
||||
|
||||
**New directory:** `runbooks/`
|
||||
- `ssh.md`, `nginx.md`, `postgres.md`, `disk.md`, `kernel.md`, etc.
|
||||
- Each runbook: YAML frontmatter (`service`, `symptoms`, `tags`) + Markdown body
|
||||
|
||||
**New CLI command:** `tai runbooks --sync [--path ./runbooks]`
|
||||
|
||||
**Changes to existing code:**
|
||||
- `prompt_builder.py`: add `build_message_with_runbooks(retrieved_chunks, runbook_chunks)`
|
||||
- `cli.py`: optionally load `RunbookStore`, query it per analysis turn
|
||||
|
||||
**Companion features buildable at same time:**
|
||||
- `tai runbooks --list` — show indexed runbooks and last sync time
|
||||
- `tai runbooks --add <file>` — index a single runbook
|
||||
- `/runbooks` slash command in interactive mode — show which runbooks were retrieved
|
||||
- Runbook citation in AI output: "Based on runbook: `ssh.md#AuthenticationFailures`"
|
||||
|
||||
---
|
||||
|
||||
### Tier 3 — Session Memory Index (institutional learning)
|
||||
|
||||
**Problem:** Every session starts from zero. Repeat incidents on the same host or
|
||||
same issue type get no benefit from past work.
|
||||
|
||||
**Approach:**
|
||||
- On session end, embed the session summary (issue + root cause + actions) and upsert into a persistent ChromaDB collection (`~/.tai/sessions/`)
|
||||
- On session start, query for similar past sessions by issue text + hostname
|
||||
- Inject top-2 past sessions as `## Prior Sessions` context
|
||||
- Optionally: `/history` command in interactive mode to surface past sessions explicitly
|
||||
|
||||
**New module:** `src/tai/session_store.py`
|
||||
- `SessionStore`: wraps ChromaDB collection at `~/.tai/sessions/`
|
||||
- `index_session(session_log_path)` — embed and store completed session
|
||||
- `query_similar(issue, host, top_k) -> list[PastSession]`
|
||||
|
||||
**Changes to existing code:**
|
||||
- `session_log.py`: add `summarise() -> str` method (issue + final AI response)
|
||||
- `cli.py`: query `SessionStore` at session start, index at session end
|
||||
|
||||
**Companion features buildable at same time:**
|
||||
- `tai history` CLI subcommand — search past sessions by keyword
|
||||
- `tai history --host <hostname>` — all sessions for a host
|
||||
- `tai history --export <file>` — export session summaries as Markdown report
|
||||
- Auto-suggest: "Similar issue found from 2 weeks ago — load context? [y/N]"
|
||||
|
||||
---
|
||||
|
||||
### Implementation Order
|
||||
|
||||
```
|
||||
Tier 1 (diagnostic chunks) ← Start here. Zero new infra. Immediate prompt quality gain.
|
||||
↓
|
||||
Tier 2 (runbook KB) ← After Tier 1. Requires ChromaDB dep + runbook authoring.
|
||||
↓
|
||||
Tier 3 (session memory) ← Builds on Tier 2 infrastructure. Minimal extra work.
|
||||
```
|
||||
|
||||
**Estimated effort:**
|
||||
- Tier 1: 2–3 days (new module + prompt builder changes + tests)
|
||||
- Tier 2: 3–4 days (ChromaDB + runbook authoring + CLI command + tests)
|
||||
- Tier 3: 1–2 days (reuses Tier 2 infrastructure)
|
||||
|
||||
### New Dependencies
|
||||
|
||||
```
|
||||
# Tier 1 (zero new runtime deps — uses Ollama HTTP API already in use)
|
||||
# No additions needed
|
||||
|
||||
# Tier 2 + 3
|
||||
chromadb>=0.5,<1.0 # embedded vector store, no separate server
|
||||
# OR
|
||||
qdrant-client>=1.9,<2.0 # if self-hosted Qdrant preferred
|
||||
|
||||
sentence-transformers>=3.0 # optional: cross-encoder reranking
|
||||
```
|
||||
|
||||
### New pyproject.toml optional group
|
||||
|
||||
```toml
|
||||
[project.optional-dependencies]
|
||||
rag = [
|
||||
"chromadb>=0.5,<1.0",
|
||||
"sentence-transformers>=3.0,<4.0",
|
||||
]
|
||||
```
|
||||
|
||||
______________________________________________________________________
|
||||
|
||||
## Decisions Log
|
||||
|
||||
| Date | Decision | Outcome |
|
||||
@@ -128,3 +292,8 @@ ______________________________________________________________________
|
||||
| 2026-05-04 | Bastion host support | `--jump-host` flag via SSH native ProxyJump |
|
||||
| 2026-05-04 | SSH config behavior | Use `~/.ssh/config` by default; allow override via `--ignore-ssh-config` |
|
||||
| 2026-05-04 | CLI vs interactive mode | Interactive: REPL for v0.1, `textual` TUI for v0.2+ |
|
||||
| 2026-05-04 | RAG embedding model | `nomic-embed-text` via Ollama (local, air-gapped safe) — ⬜ pending confirmation |
|
||||
| 2026-05-04 | RAG vector store (Tier 1) | In-memory numpy cosine similarity — zero deps, session-scoped |
|
||||
| 2026-05-04 | RAG vector store (Tier 2/3) | `chromadb` embedded mode (default) or `qdrant` self-hosted — ⬜ pending confirmation |
|
||||
| 2026-05-04 | RAG chunking unit | Command-boundary splitting — each collected command = one or more chunks |
|
||||
| 2026-05-04 | Runbook format | Markdown with YAML frontmatter, version-controlled in `runbooks/` directory |
|
||||
|
||||
@@ -8,6 +8,9 @@ from typing import Annotated
|
||||
import typer
|
||||
from rich.console import Console
|
||||
from rich.markdown import Markdown
|
||||
from rich.panel import Panel
|
||||
from rich.rule import Rule
|
||||
from rich.text import Text
|
||||
|
||||
from tai.ai_client import DEFAULT_AI_HOST, DEFAULT_MODEL, AIClient, AIConfig
|
||||
from tai.ai_guardrails import validate_ai_response
|
||||
@@ -119,11 +122,12 @@ def run(
|
||||
)
|
||||
|
||||
summary = SSHClient(config).summary()
|
||||
console.print("[bold green]tai[/bold green]")
|
||||
console.print(f"Issue: {req.issue}")
|
||||
console.print(f"SSH: {summary}")
|
||||
console.print(Rule("[bold green]tai[/bold green]", style="green"))
|
||||
console.print(f" [bold]Issue:[/bold] {req.issue}")
|
||||
console.print(f" [bold]SSH:[/bold] {summary}")
|
||||
if req.target_paths:
|
||||
console.print(f"Paths: {', '.join(str(p) for p in req.target_paths)}")
|
||||
console.print(f" [bold]Paths:[/bold] {', '.join(str(p) for p in req.target_paths)}")
|
||||
console.print()
|
||||
|
||||
if not (probe or collect or analyze or interactive):
|
||||
return # nothing SSH-related requested
|
||||
@@ -227,15 +231,20 @@ async def _interactive_loop(
|
||||
) -> None:
|
||||
"""Run a follow-up loop for collecting and conversational analysis."""
|
||||
console.print(
|
||||
"[cyan]Interactive mode:[/cyan] "
|
||||
"ask questions directly, or use /collect, /analyze, /help, /quit"
|
||||
Panel(
|
||||
"Ask questions directly, or use [bold]/collect[/bold], "
|
||||
"[bold]/analyze[/bold], [bold]/help[/bold], [bold]/quit[/bold]",
|
||||
title="[bold cyan]Interactive Mode[/bold cyan]",
|
||||
border_style="cyan",
|
||||
padding=(0, 1),
|
||||
)
|
||||
)
|
||||
|
||||
prior_questions: list[str] = []
|
||||
|
||||
while True:
|
||||
try:
|
||||
command = input("tai> ").strip()
|
||||
command = console.input("\n[bold cyan]tai[/bold cyan][dim] >[/dim] ").strip()
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
console.print("\n[yellow]Exiting interactive mode.[/yellow]")
|
||||
if logger is not None:
|
||||
@@ -252,8 +261,18 @@ async def _interactive_loop(
|
||||
return
|
||||
|
||||
if command == "/help":
|
||||
console.print("Commands: /collect, /analyze, /help, /quit")
|
||||
console.print("Tip: any non-slash text is treated as a follow-up AI question.")
|
||||
console.print(
|
||||
Panel(
|
||||
"[bold]/collect[/bold] — re-run diagnostics\n"
|
||||
"[bold]/analyze[/bold] — re-analyze current diagnostics\n"
|
||||
"[bold]/help[/bold] — show this message\n"
|
||||
"[bold]/quit[/bold] — end session\n"
|
||||
"[dim]Anything else is sent directly to the AI as a question.[/dim]",
|
||||
title="[bold]Commands[/bold]",
|
||||
border_style="dim",
|
||||
padding=(0, 1),
|
||||
)
|
||||
)
|
||||
continue
|
||||
|
||||
if command == "/collect":
|
||||
@@ -319,26 +338,32 @@ async def _interactive_loop(
|
||||
|
||||
def _handle_probe_result(result: SSHCommandResult) -> None:
|
||||
"""Handle and render probe output for success or failure."""
|
||||
console.print("[cyan]Running SSH probe:[/cyan] uname -a")
|
||||
console.print("[dim]▶ SSH probe:[/dim] uname -a")
|
||||
if result.exit_code != 0:
|
||||
details = result.stderr or result.stdout or "no error output from ssh"
|
||||
console.print(f"[red]Probe failed (exit {result.exit_code}):[/red] {details}")
|
||||
console.print(f"[bold red]✗ Probe failed[/bold red] (exit {result.exit_code}): {details}")
|
||||
raise typer.Exit(code=1)
|
||||
output = result.stdout or "(no output)"
|
||||
console.print("[bold green]Probe succeeded.[/bold green]")
|
||||
console.print(f"Remote: {output}")
|
||||
console.print("[bold green]✓ Probe succeeded.[/bold green]")
|
||||
console.print(f" [dim]{output}[/dim]")
|
||||
|
||||
|
||||
def _handle_collection_report(report: CollectionReport) -> None:
|
||||
"""Render collected command status and truncation hints."""
|
||||
console.print(
|
||||
f"[bold]Collection complete:[/bold] {report.total} commands, {report.failed} failed"
|
||||
failed_label = (
|
||||
f"[red]{report.failed} failed[/red]" if report.failed else "[green]0 failed[/green]"
|
||||
)
|
||||
console.print(f"[bold]Collection complete:[/bold] {report.total} commands, {failed_label}")
|
||||
for item in report.items:
|
||||
status = "ok" if item.result.exit_code == 0 else f"exit {item.result.exit_code}"
|
||||
truncated = item.result.stdout_truncated or item.result.stderr_truncated
|
||||
trunc = " (truncated)" if truncated else ""
|
||||
console.print(f"- {item.name}: {status}{trunc}")
|
||||
trunc_label = " [dim](truncated)[/dim]" if truncated else ""
|
||||
if item.result.exit_code == 0:
|
||||
console.print(f" [green]✓[/green] [dim]{item.name}[/dim]{trunc_label}")
|
||||
else:
|
||||
console.print(
|
||||
f" [red]✗[/red] {item.name} "
|
||||
f"[red](exit {item.result.exit_code})[/red]{trunc_label}"
|
||||
)
|
||||
|
||||
|
||||
def _run_analysis(
|
||||
@@ -349,7 +374,9 @@ def _run_analysis(
|
||||
logger: SessionLogger | None,
|
||||
) -> None:
|
||||
"""Send collected data to the AI and stream the analysis to stdout."""
|
||||
console.print("[cyan]Analyzing...[/cyan]\n")
|
||||
console.print()
|
||||
console.print(Rule("[bold cyan]Analysis[/bold cyan]", style="cyan"))
|
||||
console.print()
|
||||
ai = AIClient(ai_config)
|
||||
system_prompt = build_system_prompt()
|
||||
user_message = build_user_message(issue, report)
|
||||
@@ -362,7 +389,10 @@ def _run_analysis(
|
||||
|
||||
warnings = validate_ai_response(response)
|
||||
for item in warnings:
|
||||
console.print(f"[yellow]Guardrail warning:[/yellow] {item}")
|
||||
warn_text = Text()
|
||||
warn_text.append("⚠ Guardrail: ", style="bold yellow")
|
||||
warn_text.append(item, style="yellow")
|
||||
console.print(warn_text)
|
||||
|
||||
if logger is not None:
|
||||
logger.log_event(
|
||||
@@ -390,7 +420,9 @@ def _run_followup_analysis(
|
||||
logger: SessionLogger | None,
|
||||
) -> str:
|
||||
"""Run grounded follow-up analysis re-anchored to current diagnostics."""
|
||||
console.print("[cyan]Analyzing...[/cyan]\n")
|
||||
console.print()
|
||||
console.print(Rule("[bold cyan]AI Response[/bold cyan]", style="cyan"))
|
||||
console.print()
|
||||
ai = AIClient(ai_config)
|
||||
system_prompt = build_system_prompt()
|
||||
user_message = build_followup_message(issue, report, question, prior_questions)
|
||||
@@ -401,10 +433,14 @@ def _run_followup_analysis(
|
||||
chunks.append(chunk)
|
||||
response = "".join(chunks)
|
||||
console.print(Markdown(response))
|
||||
console.print(Rule(style="dim"))
|
||||
|
||||
warnings = validate_ai_response(response)
|
||||
for item in warnings:
|
||||
console.print(f"[yellow]Guardrail warning:[/yellow] {item}")
|
||||
warn_text = Text()
|
||||
warn_text.append("⚠ Guardrail: ", style="bold yellow")
|
||||
warn_text.append(item, style="yellow")
|
||||
console.print(warn_text)
|
||||
|
||||
if logger is not None:
|
||||
logger.log_event(
|
||||
|
||||
@@ -137,8 +137,9 @@ def test_collect_success_prints_summary(monkeypatch) -> None: # type: ignore[no
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "Collection complete" in result.stdout
|
||||
assert "kernel: ok" in result.stdout
|
||||
assert "journal: ok (truncated)" in result.stdout
|
||||
assert "kernel" in result.stdout
|
||||
assert "journal" in result.stdout
|
||||
assert "truncated" in result.stdout
|
||||
|
||||
|
||||
def test_interactive_collect_then_quit(monkeypatch) -> None: # type: ignore[no-untyped-def]
|
||||
@@ -163,7 +164,7 @@ def test_interactive_collect_then_quit(monkeypatch) -> None: # type: ignore[no-
|
||||
commands = iter(["/collect", "/quit"])
|
||||
|
||||
monkeypatch.setattr("tai.cli.collect_from_plan", fake_collect_from_plan)
|
||||
monkeypatch.setattr("builtins.input", lambda _prompt: next(commands))
|
||||
monkeypatch.setattr("tai.cli.console.input", lambda _prompt: next(commands))
|
||||
|
||||
runner = CliRunner()
|
||||
result = runner.invoke(
|
||||
@@ -180,7 +181,7 @@ def test_interactive_collect_then_quit(monkeypatch) -> None: # type: ignore[no-
|
||||
)
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "Interactive mode" in result.stdout
|
||||
assert "ask questions directly" in result.stdout.lower()
|
||||
assert "Collection complete" in result.stdout
|
||||
assert "Bye." in result.stdout
|
||||
|
||||
@@ -210,7 +211,7 @@ def test_interactive_unknown_command_prints_hint(monkeypatch) -> None: # type:
|
||||
"tai.cli.AIClient.stream",
|
||||
lambda *_args, **_kwargs: iter(["Check logs."]),
|
||||
)
|
||||
monkeypatch.setattr("builtins.input", lambda _prompt: next(commands))
|
||||
monkeypatch.setattr("tai.cli.console.input", lambda _prompt: next(commands))
|
||||
|
||||
runner = CliRunner()
|
||||
result = runner.invoke(
|
||||
@@ -227,5 +228,5 @@ def test_interactive_unknown_command_prints_hint(monkeypatch) -> None: # type:
|
||||
)
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "Analyzing..." in result.stdout
|
||||
assert "AI Response" in result.stdout
|
||||
assert "Check logs." in result.stdout
|
||||
|
||||
Reference in New Issue
Block a user