zphinx/tai

Files

zphinx 013410999a feat: finalize package presence branch and docs alignment

2026-05-11 20:38:16 +02:00

5.4 KiB

Raw Blame History

tai - Linux AI Troubleshooting Agent

tai is a read-only Linux troubleshooting assistant that connects to remote hosts via SSH, collects diagnostics, and runs grounded AI analysis using local models.

The project is designed for operators who want AI speed without losing operational safety or evidence traceability.

What tai Does

Runs safe, read-only remote checks over SSH
Builds a diagnostics collection plan from issue text
Supports one-shot analysis and interactive follow-up mode
Uses local AI backends (OpenAI-compatible endpoint, typically Ollama)
Uses RAG over collected diagnostics (Tier 1)
Uses persistent runbook retrieval with ChromaDB (Tier 2)
Emits structured Markdown analysis with evidence and actions
Can log session and retrieval telemetry locally as JSONL

Safety Model

tai enforces read-only command policy on all remote commands.

Allowlist based command validation
Blocked shell operators (>, >>, <, |, &&, ||, ;)
No write/mutation actions are executed on target hosts

The tool may suggest remediation commands in output, but does not execute them.

Current Feature Set

Core CLI

tai run ... main troubleshooting entrypoint
SSH options: host, port, identity file, jump host, SSH config control
Live probe mode (uname -a)
Diagnostics collection mode
AI analysis mode
Interactive loop with /collect, /analyze, /help, /quit

AI and Prompting

OpenAI-compatible AI client
Configurable model, timeout, token budget
Guardrails to keep responses evidence-based
Initial and follow-up prompts grounded in collected diagnostics
Non-streaming completion path for local backend reliability

RAG and Knowledge

Tier 1: semantic retrieval of diagnostic chunks per question
Tier 2: persistent runbook knowledge base with ChromaDB
Runbook retrieval injected as separate prompt context
Retrieval debug output (--rag-debug)
Full-context fallback if retrieval/indexing fails

Runbook Management

tai runbooks sync --path ./runbooks --store ~/.tai/runbooks
tai runbooks list --store ~/.tai/runbooks
tai runbooks add <file> --store ~/.tai/runbooks

Presence and Absence Signals

For recognized services/subsystems (for example sssd, docker, x2go, xorg, wayland, selinux, apparmor), collection includes:

service unit-file discovery (systemctl list-unit-files ...)
binary presence checks via ls -l <expected path>
service status and journals
selected config path probes where defined

This improves analysis quality for "component missing/not installed" scenarios.

Repository Layout

src/tai/
  cli.py                # CLI commands and orchestration
  ssh_client.py         # SSH execution + read-only policy
  collectors.py         # execution of collection plans
  plan.py               # issue -> command plan builder
  ai_client.py          # OpenAI-compatible AI + embeddings client
  ai_guardrails.py      # response guardrails/validation
  prompt_builder.py     # prompt composition
  rag_retriever.py      # diagnostic chunk retrieval
  runbook_store.py      # persistent ChromaDB runbook index/query
  chroma_telemetry.py   # no-op Chroma telemetry client
  session_log.py        # JSONL session logging
  input_parser.py       # CLI input validation
  models.py             # domain request models

runbooks/
  *.md                  # Markdown runbooks with frontmatter

tests/
  test_*.py             # unit and CLI coverage

Installation

python -m venv .venv
source .venv/bin/activate
pip install -e .

RAG runbook storage requires optional dependencies:

pip install -e .[rag]

Development dependencies:

pip install -e .[dev]

AI Backend Setup (Ollama)

tai expects an OpenAI-compatible API endpoint, defaulting to http://localhost:11434/v1.

ollama pull gemma3:4b
ollama pull nomic-embed-text

Quick backend check:

curl http://localhost:11434/api/generate \
  -d '{"model":"gemma3:4b","prompt":"hello","stream":false}'

Usage

Basic Probe and Collect

tai run "nginx failing to start" \
  --host web01 \
  --probe \
  --collect

Analyze with RAG and Runbooks

tai run "why isnt sssd working?" \
  --host ssh.archflux.net \
  --port 5566 \
  --probe --collect --analyze \
  --runbooks ~/.tai/runbooks \
  --rag-debug \
  --ai-timeout-seconds 45 \
  --ai-max-tokens 300

Interactive Session

tai run "docker daemon keeps failing" \
  --host app01 \
  --collect \
  --interactive \
  --runbooks ~/.tai/runbooks

Runbook Workflow

Write Markdown runbooks in runbooks/ with frontmatter keys: service, symptoms, tags.
Sync the store.
Pass --runbooks <store-path> to tai run.

Example:

tai runbooks sync --path ./runbooks --store ~/.tai/runbooks
tai runbooks list --store ~/.tai/runbooks

Testing

pytest

Focused suites:

pytest tests/test_plan.py tests/test_ai.py tests/test_cli.py

Known Limits

Deep service-specific probes (known binary/config/package aliases) are richer for recognized services than generic service names.
Session memory is available via --session-memory, but dedicated history UX commands (tai history, /history) are not implemented yet.

Changelog and Roadmap

See CHANGELOG.md for release history.
See ROADMAP.md for phase status and next milestones.
See docs/ARCHITECTURE.md for module-level architecture and data flow.

5.4 KiB Raw Blame History