tai/README.md

# tai - Linux AI Troubleshooting Agent

`tai` is a read-only Linux troubleshooting assistant that connects to remote hosts via SSH, collects diagnostics, and runs grounded AI analysis using local models.

The project is designed for operators who want AI speed without losing operational safety or evidence traceability.

## What tai Does

- Runs safe, read-only remote checks over SSH
- Builds a diagnostics collection plan from issue text
- Supports one-shot analysis and interactive follow-up mode
- Uses local AI backends (OpenAI-compatible endpoint, typically Ollama)
- Uses RAG over collected diagnostics (Tier 1)
- Uses persistent runbook retrieval with ChromaDB (Tier 2)
- Emits structured Markdown analysis with evidence and actions
- Can log session and retrieval telemetry locally as JSONL

## Safety Model

`tai` enforces read-only command policy on all remote commands.

- Allowlist based command validation
- Blocked shell operators (`>`, `>>`, `<`, `|`, `&&`, `||`, `;`)
- No write/mutation actions are executed on target hosts

The tool may suggest remediation commands in output, but does not execute them.

## Current Feature Set

### Core CLI

- `tai run ...` main troubleshooting entrypoint
- SSH options: host, port, identity file, jump host, SSH config control
- Live probe mode (`uname -a`)
- Diagnostics collection mode
- AI analysis mode
- Interactive loop with `/collect`, `/analyze`, `/help`, `/quit`

### AI and Prompting

- OpenAI-compatible AI client
- Configurable model, timeout, token budget
- Guardrails to keep responses evidence-based
- Initial and follow-up prompts grounded in collected diagnostics
- Non-streaming completion path for local backend reliability

### RAG and Knowledge

- Tier 1: semantic retrieval of diagnostic chunks per question
- Tier 2: persistent runbook knowledge base with ChromaDB
- Runbook retrieval injected as separate prompt context
- Retrieval debug output (`--rag-debug`)
- Full-context fallback if retrieval/indexing fails

### Runbook Management

- `tai runbooks sync --path ./runbooks --store ~/.tai/runbooks`
- `tai runbooks list --store ~/.tai/runbooks`
- `tai runbooks add <file> --store ~/.tai/runbooks`

### Presence and Absence Signals

For recognized services/subsystems (for example `sssd`, `docker`, `x2go`, `xorg`, `wayland`, `selinux`, `apparmor`), collection includes:

- service unit-file discovery (`systemctl list-unit-files ...`)
- binary presence checks via `ls -l <expected path>`
- service status and journals
- selected config path probes where defined

This improves analysis quality for "component missing/not installed" scenarios.

## Repository Layout

```text
src/tai/
  cli.py                # CLI commands and orchestration
  ssh_client.py         # SSH execution + read-only policy
  collectors.py         # execution of collection plans
  plan.py               # issue -> command plan builder
  ai_client.py          # OpenAI-compatible AI + embeddings client
  ai_guardrails.py      # response guardrails/validation
  prompt_builder.py     # prompt composition
  rag_retriever.py      # diagnostic chunk retrieval
  runbook_store.py      # persistent ChromaDB runbook index/query
  chroma_telemetry.py   # no-op Chroma telemetry client
  session_log.py        # JSONL session logging
  input_parser.py       # CLI input validation
  models.py             # domain request models

runbooks/
  *.md                  # Markdown runbooks with frontmatter

tests/
  test_*.py             # unit and CLI coverage
```

## Installation

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e .
```

RAG runbook storage requires optional dependencies:

```bash
pip install -e .[rag]
```

Development dependencies:

```bash
pip install -e .[dev]
```

## AI Backend Setup (Ollama)

`tai` expects an OpenAI-compatible API endpoint, defaulting to `http://localhost:11434/v1`.

```bash
ollama pull gemma3:4b
ollama pull nomic-embed-text
```

Quick backend check:

```bash
curl http://localhost:11434/api/generate \
  -d '{"model":"gemma3:4b","prompt":"hello","stream":false}'
```

## Usage

### Basic Probe and Collect

```bash
tai run "nginx failing to start" \
  --host web01 \
  --probe \
  --collect
```

### Analyze with RAG and Runbooks

```bash
tai run "why isnt sssd working?" \
  --host ssh.archflux.net \
  --port 5566 \
  --probe --collect --analyze \
  --runbooks ~/.tai/runbooks \
  --rag-debug \
  --ai-timeout-seconds 45 \
  --ai-max-tokens 300
```

### Interactive Session

```bash
tai run "docker daemon keeps failing" \
  --host app01 \
  --collect \
  --interactive \
  --runbooks ~/.tai/runbooks
```

## Runbook Workflow

1. Write Markdown runbooks in `runbooks/` with frontmatter keys: `service`, `symptoms`, `tags`.
1. Sync the store.
1. Pass `--runbooks <store-path>` to `tai run`.

Example:

```bash
tai runbooks sync --path ./runbooks --store ~/.tai/runbooks
tai runbooks list --store ~/.tai/runbooks
```

## Testing

```bash
pytest
```

Focused suites:

```bash
pytest tests/test_plan.py tests/test_ai.py tests/test_cli.py
```

## Known Limits

- Deep service-specific probes (known binary/config/package aliases) are richer for recognized services than generic service names.
- Session memory is available via `--session-memory`, but dedicated history UX commands (`tai history`, `/history`) are not implemented yet.

## Changelog and Roadmap

- See `CHANGELOG.md` for release history.
- See `ROADMAP.md` for phase status and next milestones.
- See `docs/ARCHITECTURE.md` for module-level architecture and data flow.