# tai - Linux AI Troubleshooting Agent `tai` is a read-only Linux troubleshooting assistant that connects to remote hosts via SSH, collects diagnostics, and runs grounded AI analysis using local models. The project is designed for operators who want AI speed without losing operational safety or evidence traceability. ## What tai Does - Runs safe, read-only remote checks over SSH - Builds a diagnostics collection plan from issue text - Supports one-shot analysis and interactive follow-up mode - Uses local AI backends (OpenAI-compatible endpoint, typically Ollama) - Uses RAG over collected diagnostics (Tier 1) - Uses persistent runbook retrieval with ChromaDB (Tier 2) - Emits structured Markdown analysis with evidence and actions - Can log session and retrieval telemetry locally as JSONL ## Safety Model `tai` enforces read-only command policy on all remote commands. - Allowlist based command validation - Blocked shell operators (`>`, `>>`, `<`, `|`, `&&`, `||`, `;`) - No write/mutation actions are executed on target hosts The tool may suggest remediation commands in output, but does not execute them. ## Current Feature Set ### Core CLI - `tai run ...` main troubleshooting entrypoint - SSH options: host, port, identity file, jump host, SSH config control - Live probe mode (`uname -a`) - Diagnostics collection mode - AI analysis mode - Optional analysis export via `--output-file ` (`--output-format markdown|json`) - Automatic host history persistence/read via database (`--history-db`, `--history/--no-history`) - Interactive loop with `/collect`, `/analyze`, `/help`, `/quit` ### AI and Prompting - OpenAI-compatible AI client - Configurable model, timeout, token budget - Guardrails to keep responses evidence-based - Initial and follow-up prompts grounded in collected diagnostics - Non-streaming completion path for local backend reliability ### RAG and Knowledge - Tier 1: semantic retrieval of diagnostic chunks per question - Tier 2: persistent runbook knowledge base with ChromaDB - Runbook retrieval injected as separate prompt context - Retrieval debug output (`--rag-debug`) - Full-context fallback if retrieval/indexing fails ### Runbook Management - `tai runbooks sync --path ./runbooks --store ~/.tai/runbooks` - `tai runbooks list --store ~/.tai/runbooks` - `tai runbooks add --store ~/.tai/runbooks` ### Presence and Absence Signals For recognized services/subsystems (for example `sssd`, `docker`, `x2go`, `xorg`, `wayland`, `selinux`, `apparmor`), collection includes: - service unit-file discovery (`systemctl list-unit-files ...`) - binary presence checks via `ls -l ` - service status and journals - selected config path probes where defined This improves analysis quality for "component missing/not installed" scenarios. ## Repository Layout ```text src/tai/ cli.py # CLI commands and orchestration ssh_client.py # SSH execution + read-only policy collectors.py # execution of collection plans plan.py # issue -> command plan builder ai_client.py # OpenAI-compatible AI + embeddings client ai_guardrails.py # response guardrails/validation prompt_builder.py # prompt composition rag_retriever.py # diagnostic chunk retrieval runbook_store.py # persistent ChromaDB runbook index/query chroma_telemetry.py # no-op Chroma telemetry client session_log.py # JSONL session logging input_parser.py # CLI input validation models.py # domain request models runbooks/ *.md # Markdown runbooks with frontmatter tests/ test_*.py # unit and CLI coverage ``` ## Installation ```bash python -m venv .venv source .venv/bin/activate pip install -e . ``` RAG runbook storage requires optional dependencies: ```bash pip install -e .[rag] ``` Development dependencies: ```bash pip install -e .[dev] ``` ## AI Backend Setup (Ollama) `tai` expects an OpenAI-compatible API endpoint, defaulting to `http://localhost:11434/v1`. ```bash ollama pull gemma3:4b ollama pull nomic-embed-text ``` Quick backend check: ```bash curl http://localhost:11434/api/generate \ -d '{"model":"gemma3:4b","prompt":"hello","stream":false}' ``` ## Usage ### Basic Probe and Collect ```bash tai run "nginx failing to start" \ --host web01 \ --probe \ --collect ``` ### Analyze with RAG and Runbooks ```bash tai run "why isnt sssd working?" \ --host ssh.archflux.net \ --port 5566 \ --probe --collect --analyze \ --runbooks ~/.tai/runbooks \ --rag-debug \ --ai-timeout-seconds 45 \ --ai-max-tokens 300 ``` ### Interactive Session ```bash tai run "docker daemon keeps failing" \ --host app01 \ --collect \ --interactive \ --runbooks ~/.tai/runbooks ``` ### Write Analysis to File ```bash tai run "sshd authentication failed" \ --host bastion01 \ --collect --analyze \ --output-file ./reports/sshd-analysis.md ``` JSON export: ```bash tai run "sshd authentication failed" \ --host bastion01 \ --collect --analyze \ --output-file ./reports/sshd-analysis.json \ --output-format json ``` JSON export includes host-specific run metadata: - `schema` and `generated_at` - `issue`, `host`, `model` - `collection` summary (`total`, `failed`, `succeeded`) - `token_usage` (`prompt_tokens`, `completion_tokens`, `total_tokens`) when available from backend - `analysis` text By default, each analyzed run is also written to the history database and prior sessions for the same host are read and injected as historical context. Database targets supported by `--history-db`: - SQLite file path (for example `~/.tai/history.db`) - SQLite URL (for example `sqlite:////tmp/tai-history.db`) - PostgreSQL DSN (for example `postgresql://user:pass@dbhost:5432/tai`) Example using remote PostgreSQL history database: ```bash tai run "sshd authentication failed" \ --host bastion01 \ --collect --analyze \ --history-db postgresql://tai_user:secret@db.internal:5432/tai ``` Credential options for external history DB: - `--history-db-user ` - `--history-db-password ` - `--env-file ` (loads dotenv values) Dotenv keys for history DB credentials: - `TAI_HISTORY_DB_USER` - `TAI_HISTORY_DB_PASSWORD` Runbook store targets supported by `--runbooks` and `tai runbooks --store`: - Local embedded ChromaDB path (default) - Remote ChromaDB URL (for example `http://chroma.internal:8000`) Example using remote ChromaDB runbook store at analysis time: ```bash tai run "nginx failing after reboot" \ --host web01 \ --collect --analyze \ --runbooks http://chroma.internal:8000 ``` Credential options for remote runbook store: - `--runbooks-user ` / `--runbooks-password ` on `tai run` - `--store-user ` / `--store-password ` on `tai runbooks ...` - `--env-file ` (loads dotenv values) Dotenv keys for runbook store credentials: - `TAI_RUNBOOK_STORE_USER` - `TAI_RUNBOOK_STORE_PASSWORD` Remote runbook (playbook) sources supported by `tai runbooks sync --path`: - Local directory path (for example `./runbooks`) - SSH directory URI (for example `ssh://ops@ssh.archflux.net/opt/tai/runbooks`) - HTTP/HTTPS webroot URL that exposes `.md` links (for example `https://kb.example/runbooks/`) Webroot hardening rules: - Only `.md` links are considered for download. - Downloaded payload must look like real Markdown (HTML wrappers are ignored). - Non-markdown payloads are discarded. - Downloaded content is never executed. It is stored as plain text and only parsed for AI retrieval context. Single runbook (playbook) sources supported by `tai runbooks add`: - Local file path - SSH file URI (for example `ssh://ops@ssh.archflux.net/opt/tai/runbooks/nginx.md`) - HTTP/HTTPS URL to a Markdown file For HTTP/HTTPS single-file add, the source URL must end in `.md` and resolve to Markdown content. Examples: ```bash # Sync from SSH-hosted runbooks directory into remote ChromaDB tai runbooks sync \ --path ssh://ops@ssh.archflux.net/opt/tai/runbooks \ --store http://chroma.internal:8000 # Sync from HTTPS webroot listing Markdown runbooks tai runbooks sync \ --path https://kb.example/runbooks/ \ --store ~/.tai/runbooks # Add one runbook directly from HTTPS tai runbooks add https://kb.example/runbooks/nginx.md --store ~/.tai/runbooks ``` ## Runbook Workflow 1. Write Markdown runbooks in `runbooks/` with frontmatter keys: `service`, `symptoms`, `tags`. 1. Sync the store. 1. Pass `--runbooks ` to `tai run`. Example: ```bash tai runbooks sync --path ./runbooks --store ~/.tai/runbooks tai runbooks list --store ~/.tai/runbooks ``` ## Testing ```bash pytest ``` Focused suites: ```bash pytest tests/test_plan.py tests/test_ai.py tests/test_cli.py ``` ## Man Page A manual page is available at `docs/tai.1`. Render it locally: ```bash man ./docs/tai.1 ``` ## Known Limits - Deep service-specific probes (known binary/config/package aliases) are richer for recognized services than generic service names. - Clipboard export is intentionally not implemented. ## Changelog and Roadmap - See `CHANGELOG.md` for release history. - See `ROADMAP.md` for phase status and next milestones. - See `docs/ARCHITECTURE.md` for module-level architecture and data flow.