202 lines
5.4 KiB
Markdown
202 lines
5.4 KiB
Markdown
# tai - Linux AI Troubleshooting Agent
|
|
|
|
`tai` is a read-only Linux troubleshooting assistant that connects to remote hosts via SSH, collects diagnostics, and runs grounded AI analysis using local models.
|
|
|
|
The project is designed for operators who want AI speed without losing operational safety or evidence traceability.
|
|
|
|
## What tai Does
|
|
|
|
- Runs safe, read-only remote checks over SSH
|
|
- Builds a diagnostics collection plan from issue text
|
|
- Supports one-shot analysis and interactive follow-up mode
|
|
- Uses local AI backends (OpenAI-compatible endpoint, typically Ollama)
|
|
- Uses RAG over collected diagnostics (Tier 1)
|
|
- Uses persistent runbook retrieval with ChromaDB (Tier 2)
|
|
- Emits structured Markdown analysis with evidence and actions
|
|
- Can log session and retrieval telemetry locally as JSONL
|
|
|
|
## Safety Model
|
|
|
|
`tai` enforces read-only command policy on all remote commands.
|
|
|
|
- Allowlist based command validation
|
|
- Blocked shell operators (`>`, `>>`, `<`, `|`, `&&`, `||`, `;`)
|
|
- No write/mutation actions are executed on target hosts
|
|
|
|
The tool may suggest remediation commands in output, but does not execute them.
|
|
|
|
## Current Feature Set
|
|
|
|
### Core CLI
|
|
|
|
- `tai run ...` main troubleshooting entrypoint
|
|
- SSH options: host, port, identity file, jump host, SSH config control
|
|
- Live probe mode (`uname -a`)
|
|
- Diagnostics collection mode
|
|
- AI analysis mode
|
|
- Interactive loop with `/collect`, `/analyze`, `/help`, `/quit`
|
|
|
|
### AI and Prompting
|
|
|
|
- OpenAI-compatible AI client
|
|
- Configurable model, timeout, token budget
|
|
- Guardrails to keep responses evidence-based
|
|
- Initial and follow-up prompts grounded in collected diagnostics
|
|
- Non-streaming completion path for local backend reliability
|
|
|
|
### RAG and Knowledge
|
|
|
|
- Tier 1: semantic retrieval of diagnostic chunks per question
|
|
- Tier 2: persistent runbook knowledge base with ChromaDB
|
|
- Runbook retrieval injected as separate prompt context
|
|
- Retrieval debug output (`--rag-debug`)
|
|
- Full-context fallback if retrieval/indexing fails
|
|
|
|
### Runbook Management
|
|
|
|
- `tai runbooks sync --path ./runbooks --store ~/.tai/runbooks`
|
|
- `tai runbooks list --store ~/.tai/runbooks`
|
|
- `tai runbooks add <file> --store ~/.tai/runbooks`
|
|
|
|
### Presence and Absence Signals
|
|
|
|
For recognized services/subsystems (for example `sssd`, `docker`, `x2go`, `xorg`, `wayland`, `selinux`, `apparmor`), collection includes:
|
|
|
|
- service unit-file discovery (`systemctl list-unit-files ...`)
|
|
- binary presence checks via `ls -l <expected path>`
|
|
- service status and journals
|
|
- selected config path probes where defined
|
|
|
|
This improves analysis quality for "component missing/not installed" scenarios.
|
|
|
|
## Repository Layout
|
|
|
|
```text
|
|
src/tai/
|
|
cli.py # CLI commands and orchestration
|
|
ssh_client.py # SSH execution + read-only policy
|
|
collectors.py # execution of collection plans
|
|
plan.py # issue -> command plan builder
|
|
ai_client.py # OpenAI-compatible AI + embeddings client
|
|
ai_guardrails.py # response guardrails/validation
|
|
prompt_builder.py # prompt composition
|
|
rag_retriever.py # diagnostic chunk retrieval
|
|
runbook_store.py # persistent ChromaDB runbook index/query
|
|
chroma_telemetry.py # no-op Chroma telemetry client
|
|
session_log.py # JSONL session logging
|
|
input_parser.py # CLI input validation
|
|
models.py # domain request models
|
|
|
|
runbooks/
|
|
*.md # Markdown runbooks with frontmatter
|
|
|
|
tests/
|
|
test_*.py # unit and CLI coverage
|
|
```
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -e .
|
|
```
|
|
|
|
RAG runbook storage requires optional dependencies:
|
|
|
|
```bash
|
|
pip install -e .[rag]
|
|
```
|
|
|
|
Development dependencies:
|
|
|
|
```bash
|
|
pip install -e .[dev]
|
|
```
|
|
|
|
## AI Backend Setup (Ollama)
|
|
|
|
`tai` expects an OpenAI-compatible API endpoint, defaulting to `http://localhost:11434/v1`.
|
|
|
|
```bash
|
|
ollama pull gemma3:4b
|
|
ollama pull nomic-embed-text
|
|
```
|
|
|
|
Quick backend check:
|
|
|
|
```bash
|
|
curl http://localhost:11434/api/generate \
|
|
-d '{"model":"gemma3:4b","prompt":"hello","stream":false}'
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Probe and Collect
|
|
|
|
```bash
|
|
tai run "nginx failing to start" \
|
|
--host web01 \
|
|
--probe \
|
|
--collect
|
|
```
|
|
|
|
### Analyze with RAG and Runbooks
|
|
|
|
```bash
|
|
tai run "why isnt sssd working?" \
|
|
--host ssh.archflux.net \
|
|
--port 5566 \
|
|
--probe --collect --analyze \
|
|
--runbooks ~/.tai/runbooks \
|
|
--rag-debug \
|
|
--ai-timeout-seconds 45 \
|
|
--ai-max-tokens 300
|
|
```
|
|
|
|
### Interactive Session
|
|
|
|
```bash
|
|
tai run "docker daemon keeps failing" \
|
|
--host app01 \
|
|
--collect \
|
|
--interactive \
|
|
--runbooks ~/.tai/runbooks
|
|
```
|
|
|
|
## Runbook Workflow
|
|
|
|
1. Write Markdown runbooks in `runbooks/` with frontmatter keys: `service`, `symptoms`, `tags`.
|
|
1. Sync the store.
|
|
1. Pass `--runbooks <store-path>` to `tai run`.
|
|
|
|
Example:
|
|
|
|
```bash
|
|
tai runbooks sync --path ./runbooks --store ~/.tai/runbooks
|
|
tai runbooks list --store ~/.tai/runbooks
|
|
```
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
pytest
|
|
```
|
|
|
|
Focused suites:
|
|
|
|
```bash
|
|
pytest tests/test_plan.py tests/test_ai.py tests/test_cli.py
|
|
```
|
|
|
|
## Known Limits
|
|
|
|
- Deep service-specific probes (known binary/config/package aliases) are richer for recognized services than generic service names.
|
|
- Session memory is available via `--session-memory`, but dedicated history UX commands (`tai history`, `/history`) are not implemented yet.
|
|
|
|
## Changelog and Roadmap
|
|
|
|
- See `CHANGELOG.md` for release history.
|
|
- See `ROADMAP.md` for phase status and next milestones.
|
|
- See `docs/ARCHITECTURE.md` for module-level architecture and data flow.
|