5.8 KiB
tai - Linux AI Troubleshooting Agent
tai is a read-only Linux troubleshooting assistant that connects to remote hosts via SSH, collects diagnostics, and runs grounded AI analysis using local models.
The project is designed for operators who want AI speed without losing operational safety or evidence traceability.
What tai Does
- Runs safe, read-only remote checks over SSH
- Builds a diagnostics collection plan from issue text
- Supports one-shot analysis and interactive follow-up mode
- Uses local AI backends (OpenAI-compatible endpoint, typically Ollama)
- Uses RAG over collected diagnostics (Tier 1)
- Uses persistent runbook retrieval with ChromaDB (Tier 2)
- Emits structured Markdown analysis with evidence and actions
- Can log session and retrieval telemetry locally as JSONL
Safety Model
tai enforces read-only command policy on all remote commands.
- Allowlist based command validation
- Blocked shell operators (
>,>>,<,|,&&,||,;) - No write/mutation actions are executed on target hosts
The tool may suggest remediation commands in output, but does not execute them.
Current Feature Set
Core CLI
tai run ...main troubleshooting entrypoint- SSH options: host, port, identity file, jump host, SSH config control
- Live probe mode (
uname -a) - Diagnostics collection mode
- AI analysis mode
- Optional analysis export via
--output-file <path>(--output-format markdown|json) - Interactive loop with
/collect,/analyze,/help,/quit
AI and Prompting
- OpenAI-compatible AI client
- Configurable model, timeout, token budget
- Guardrails to keep responses evidence-based
- Initial and follow-up prompts grounded in collected diagnostics
- Non-streaming completion path for local backend reliability
RAG and Knowledge
- Tier 1: semantic retrieval of diagnostic chunks per question
- Tier 2: persistent runbook knowledge base with ChromaDB
- Runbook retrieval injected as separate prompt context
- Retrieval debug output (
--rag-debug) - Full-context fallback if retrieval/indexing fails
Runbook Management
tai runbooks sync --path ./runbooks --store ~/.tai/runbookstai runbooks list --store ~/.tai/runbookstai runbooks add <file> --store ~/.tai/runbooks
Presence and Absence Signals
For recognized services/subsystems (for example sssd, docker, x2go, xorg, wayland, selinux, apparmor), collection includes:
- service unit-file discovery (
systemctl list-unit-files ...) - binary presence checks via
ls -l <expected path> - service status and journals
- selected config path probes where defined
This improves analysis quality for "component missing/not installed" scenarios.
Repository Layout
src/tai/
cli.py # CLI commands and orchestration
ssh_client.py # SSH execution + read-only policy
collectors.py # execution of collection plans
plan.py # issue -> command plan builder
ai_client.py # OpenAI-compatible AI + embeddings client
ai_guardrails.py # response guardrails/validation
prompt_builder.py # prompt composition
rag_retriever.py # diagnostic chunk retrieval
runbook_store.py # persistent ChromaDB runbook index/query
chroma_telemetry.py # no-op Chroma telemetry client
session_log.py # JSONL session logging
input_parser.py # CLI input validation
models.py # domain request models
runbooks/
*.md # Markdown runbooks with frontmatter
tests/
test_*.py # unit and CLI coverage
Installation
python -m venv .venv
source .venv/bin/activate
pip install -e .
RAG runbook storage requires optional dependencies:
pip install -e .[rag]
Development dependencies:
pip install -e .[dev]
AI Backend Setup (Ollama)
tai expects an OpenAI-compatible API endpoint, defaulting to http://localhost:11434/v1.
ollama pull gemma3:4b
ollama pull nomic-embed-text
Quick backend check:
curl http://localhost:11434/api/generate \
-d '{"model":"gemma3:4b","prompt":"hello","stream":false}'
Usage
Basic Probe and Collect
tai run "nginx failing to start" \
--host web01 \
--probe \
--collect
Analyze with RAG and Runbooks
tai run "why isnt sssd working?" \
--host ssh.archflux.net \
--port 5566 \
--probe --collect --analyze \
--runbooks ~/.tai/runbooks \
--rag-debug \
--ai-timeout-seconds 45 \
--ai-max-tokens 300
Interactive Session
tai run "docker daemon keeps failing" \
--host app01 \
--collect \
--interactive \
--runbooks ~/.tai/runbooks
Write Analysis to File
tai run "sshd authentication failed" \
--host bastion01 \
--collect --analyze \
--output-file ./reports/sshd-analysis.md
JSON export:
tai run "sshd authentication failed" \
--host bastion01 \
--collect --analyze \
--output-file ./reports/sshd-analysis.json \
--output-format json
Runbook Workflow
- Write Markdown runbooks in
runbooks/with frontmatter keys:service,symptoms,tags. - Sync the store.
- Pass
--runbooks <store-path>totai run.
Example:
tai runbooks sync --path ./runbooks --store ~/.tai/runbooks
tai runbooks list --store ~/.tai/runbooks
Testing
pytest
Focused suites:
pytest tests/test_plan.py tests/test_ai.py tests/test_cli.py
Known Limits
- Deep service-specific probes (known binary/config/package aliases) are richer for recognized services than generic service names.
- Clipboard export is intentionally not implemented.
Changelog and Roadmap
- See
CHANGELOG.mdfor release history. - See
ROADMAP.mdfor phase status and next milestones. - See
docs/ARCHITECTURE.mdfor module-level architecture and data flow.