333 lines
9.1 KiB
Markdown
333 lines
9.1 KiB
Markdown
# tai - Linux AI Troubleshooting Agent
|
|
|
|
`tai` is a read-only Linux troubleshooting assistant that connects to remote hosts via SSH, collects diagnostics, and runs grounded AI analysis using local models.
|
|
|
|
The project is designed for operators who want AI speed without losing operational safety or evidence traceability.
|
|
|
|
## What tai Does
|
|
|
|
- Runs safe, read-only remote checks over SSH
|
|
- Builds a diagnostics collection plan from issue text
|
|
- Supports one-shot analysis and interactive follow-up mode
|
|
- Uses local AI backends (OpenAI-compatible endpoint, typically Ollama)
|
|
- Uses RAG over collected diagnostics (Tier 1)
|
|
- Uses persistent runbook retrieval with ChromaDB (Tier 2)
|
|
- Emits structured Markdown analysis with evidence and actions
|
|
- Can log session and retrieval telemetry locally as JSONL
|
|
|
|
## Safety Model
|
|
|
|
`tai` enforces read-only command policy on all remote commands.
|
|
|
|
- Allowlist based command validation
|
|
- Blocked shell operators (`>`, `>>`, `<`, `|`, `&&`, `||`, `;`)
|
|
- No write/mutation actions are executed on target hosts
|
|
|
|
The tool may suggest remediation commands in output, but does not execute them.
|
|
|
|
## Current Feature Set
|
|
|
|
### Core CLI
|
|
|
|
- `tai run ...` main troubleshooting entrypoint
|
|
- SSH options: host, port, identity file, jump host, SSH config control
|
|
- Live probe mode (`uname -a`)
|
|
- Diagnostics collection mode
|
|
- AI analysis mode
|
|
- Optional analysis export via `--output-file <path>` (`--output-format markdown|json`)
|
|
- Automatic host history persistence/read via database (`--history-db`, `--history/--no-history`)
|
|
- Interactive loop with `/collect`, `/analyze`, `/help`, `/quit`
|
|
|
|
### AI and Prompting
|
|
|
|
- OpenAI-compatible AI client
|
|
- Configurable model, timeout, token budget
|
|
- Guardrails to keep responses evidence-based
|
|
- Initial and follow-up prompts grounded in collected diagnostics
|
|
- Non-streaming completion path for local backend reliability
|
|
|
|
### RAG and Knowledge
|
|
|
|
- Tier 1: semantic retrieval of diagnostic chunks per question
|
|
- Tier 2: persistent runbook knowledge base with ChromaDB
|
|
- Runbook retrieval injected as separate prompt context
|
|
- Retrieval debug output (`--rag-debug`)
|
|
- Full-context fallback if retrieval/indexing fails
|
|
|
|
### Runbook Management
|
|
|
|
- `tai runbooks sync --path ./runbooks --store ~/.tai/runbooks`
|
|
- `tai runbooks list --store ~/.tai/runbooks`
|
|
- `tai runbooks add <file> --store ~/.tai/runbooks`
|
|
|
|
### Presence and Absence Signals
|
|
|
|
For recognized services/subsystems (for example `sssd`, `docker`, `x2go`, `xorg`, `wayland`, `selinux`, `apparmor`), collection includes:
|
|
|
|
- service unit-file discovery (`systemctl list-unit-files ...`)
|
|
- binary presence checks via `ls -l <expected path>`
|
|
- service status and journals
|
|
- selected config path probes where defined
|
|
|
|
This improves analysis quality for "component missing/not installed" scenarios.
|
|
|
|
## Repository Layout
|
|
|
|
```text
|
|
src/tai/
|
|
cli.py # CLI commands and orchestration
|
|
ssh_client.py # SSH execution + read-only policy
|
|
collectors.py # execution of collection plans
|
|
plan.py # issue -> command plan builder
|
|
ai_client.py # OpenAI-compatible AI + embeddings client
|
|
ai_guardrails.py # response guardrails/validation
|
|
prompt_builder.py # prompt composition
|
|
rag_retriever.py # diagnostic chunk retrieval
|
|
runbook_store.py # persistent ChromaDB runbook index/query
|
|
chroma_telemetry.py # no-op Chroma telemetry client
|
|
session_log.py # JSONL session logging
|
|
input_parser.py # CLI input validation
|
|
models.py # domain request models
|
|
|
|
runbooks/
|
|
*.md # Markdown runbooks with frontmatter
|
|
|
|
tests/
|
|
test_*.py # unit and CLI coverage
|
|
```
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -e .
|
|
```
|
|
|
|
RAG runbook storage requires optional dependencies:
|
|
|
|
```bash
|
|
pip install -e .[rag]
|
|
```
|
|
|
|
Development dependencies:
|
|
|
|
```bash
|
|
pip install -e .[dev]
|
|
```
|
|
|
|
## AI Backend Setup (Ollama)
|
|
|
|
`tai` expects an OpenAI-compatible API endpoint, defaulting to `http://localhost:11434/v1`.
|
|
|
|
```bash
|
|
ollama pull gemma3:4b
|
|
ollama pull nomic-embed-text
|
|
```
|
|
|
|
Quick backend check:
|
|
|
|
```bash
|
|
curl http://localhost:11434/api/generate \
|
|
-d '{"model":"gemma3:4b","prompt":"hello","stream":false}'
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Probe and Collect
|
|
|
|
```bash
|
|
tai run "nginx failing to start" \
|
|
--host web01 \
|
|
--probe \
|
|
--collect
|
|
```
|
|
|
|
### Analyze with RAG and Runbooks
|
|
|
|
```bash
|
|
tai run "why isnt sssd working?" \
|
|
--host ssh.archflux.net \
|
|
--port 5566 \
|
|
--probe --collect --analyze \
|
|
--runbooks ~/.tai/runbooks \
|
|
--rag-debug \
|
|
--ai-timeout-seconds 45 \
|
|
--ai-max-tokens 300
|
|
```
|
|
|
|
### Interactive Session
|
|
|
|
```bash
|
|
tai run "docker daemon keeps failing" \
|
|
--host app01 \
|
|
--collect \
|
|
--interactive \
|
|
--runbooks ~/.tai/runbooks
|
|
```
|
|
|
|
### Write Analysis to File
|
|
|
|
```bash
|
|
tai run "sshd authentication failed" \
|
|
--host bastion01 \
|
|
--collect --analyze \
|
|
--output-file ./reports/sshd-analysis.md
|
|
```
|
|
|
|
JSON export:
|
|
|
|
```bash
|
|
tai run "sshd authentication failed" \
|
|
--host bastion01 \
|
|
--collect --analyze \
|
|
--output-file ./reports/sshd-analysis.json \
|
|
--output-format json
|
|
```
|
|
|
|
JSON export includes host-specific run metadata:
|
|
|
|
- `schema` and `generated_at`
|
|
- `issue`, `host`, `model`
|
|
- `collection` summary (`total`, `failed`, `succeeded`)
|
|
- `token_usage` (`prompt_tokens`, `completion_tokens`, `total_tokens`) when available from backend
|
|
- `analysis` text
|
|
|
|
By default, each analyzed run is also written to the history database and prior
|
|
sessions for the same host are read and injected as historical context.
|
|
|
|
Database targets supported by `--history-db`:
|
|
|
|
- SQLite file path (for example `~/.tai/history.db`)
|
|
- SQLite URL (for example `sqlite:////tmp/tai-history.db`)
|
|
- PostgreSQL DSN (for example `postgresql://user:pass@dbhost:5432/tai`)
|
|
|
|
Example using remote PostgreSQL history database:
|
|
|
|
```bash
|
|
tai run "sshd authentication failed" \
|
|
--host bastion01 \
|
|
--collect --analyze \
|
|
--history-db postgresql://tai_user:secret@db.internal:5432/tai
|
|
```
|
|
|
|
Credential options for external history DB:
|
|
|
|
- `--history-db-user <user>`
|
|
- `--history-db-password <password>`
|
|
- `--env-file <path>` (loads dotenv values)
|
|
|
|
Dotenv keys for history DB credentials:
|
|
|
|
- `TAI_HISTORY_DB_USER`
|
|
- `TAI_HISTORY_DB_PASSWORD`
|
|
|
|
Runbook store targets supported by `--runbooks` and `tai runbooks --store`:
|
|
|
|
- Local embedded ChromaDB path (default)
|
|
- Remote ChromaDB URL (for example `http://chroma.internal:8000`)
|
|
|
|
Example using remote ChromaDB runbook store at analysis time:
|
|
|
|
```bash
|
|
tai run "nginx failing after reboot" \
|
|
--host web01 \
|
|
--collect --analyze \
|
|
--runbooks http://chroma.internal:8000
|
|
```
|
|
|
|
Credential options for remote runbook store:
|
|
|
|
- `--runbooks-user <user>` / `--runbooks-password <password>` on `tai run`
|
|
- `--store-user <user>` / `--store-password <password>` on `tai runbooks ...`
|
|
- `--env-file <path>` (loads dotenv values)
|
|
|
|
Dotenv keys for runbook store credentials:
|
|
|
|
- `TAI_RUNBOOK_STORE_USER`
|
|
- `TAI_RUNBOOK_STORE_PASSWORD`
|
|
|
|
Remote runbook (playbook) sources supported by `tai runbooks sync --path`:
|
|
|
|
- Local directory path (for example `./runbooks`)
|
|
- SSH directory URI (for example `ssh://ops@ssh.archflux.net/opt/tai/runbooks`)
|
|
- HTTP/HTTPS webroot URL that exposes `.md` links (for example `https://kb.example/runbooks/`)
|
|
|
|
Webroot hardening rules:
|
|
|
|
- Only `.md` links are considered for download.
|
|
- Downloaded payload must look like real Markdown (HTML wrappers are ignored).
|
|
- Non-markdown payloads are discarded.
|
|
- Downloaded content is never executed. It is stored as plain text and only parsed for AI retrieval context.
|
|
|
|
Single runbook (playbook) sources supported by `tai runbooks add`:
|
|
|
|
- Local file path
|
|
- SSH file URI (for example `ssh://ops@ssh.archflux.net/opt/tai/runbooks/nginx.md`)
|
|
- HTTP/HTTPS URL to a Markdown file
|
|
|
|
For HTTP/HTTPS single-file add, the source URL must end in `.md` and resolve to Markdown content.
|
|
|
|
Examples:
|
|
|
|
```bash
|
|
# Sync from SSH-hosted runbooks directory into remote ChromaDB
|
|
tai runbooks sync \
|
|
--path ssh://ops@ssh.archflux.net/opt/tai/runbooks \
|
|
--store http://chroma.internal:8000
|
|
|
|
# Sync from HTTPS webroot listing Markdown runbooks
|
|
tai runbooks sync \
|
|
--path https://kb.example/runbooks/ \
|
|
--store ~/.tai/runbooks
|
|
|
|
# Add one runbook directly from HTTPS
|
|
tai runbooks add https://kb.example/runbooks/nginx.md --store ~/.tai/runbooks
|
|
```
|
|
|
|
## Runbook Workflow
|
|
|
|
1. Write Markdown runbooks in `runbooks/` with frontmatter keys: `service`, `symptoms`, `tags`.
|
|
1. Sync the store.
|
|
1. Pass `--runbooks <store-path>` to `tai run`.
|
|
|
|
Example:
|
|
|
|
```bash
|
|
tai runbooks sync --path ./runbooks --store ~/.tai/runbooks
|
|
tai runbooks list --store ~/.tai/runbooks
|
|
```
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
pytest
|
|
```
|
|
|
|
Focused suites:
|
|
|
|
```bash
|
|
pytest tests/test_plan.py tests/test_ai.py tests/test_cli.py
|
|
```
|
|
|
|
## Man Page
|
|
|
|
A manual page is available at `docs/tai.1`.
|
|
|
|
Render it locally:
|
|
|
|
```bash
|
|
man ./docs/tai.1
|
|
```
|
|
|
|
## Known Limits
|
|
|
|
- Deep service-specific probes (known binary/config/package aliases) are richer for recognized services than generic service names.
|
|
- Clipboard export is intentionally not implemented.
|
|
|
|
## Changelog and Roadmap
|
|
|
|
- See `CHANGELOG.md` for release history.
|
|
- See `ROADMAP.md` for phase status and next milestones.
|
|
- See `docs/ARCHITECTURE.md` for module-level architecture and data flow.
|