2026-05-04 04:54:50 +02:00
2 changed files with 62 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -29,7 +29,65 @@ A troubleshooter receives a ticket reporting that the Apache service on a remote

 | Component | Tool |
 |-----------|------|
-| AI inference backend | [vLLM](https://github.com/vllm-project/vllm) |
-| Model | `gemma4:a4b` |
+| AI inference backend | [Ollama](https://ollama.com) |
+| Model | `gemma3:4b`, `llama3.1:8b`, or `qwen2.5:7b` |
+| Language | Python 3.11+ |

-> **Note:** A suitable implementation language for this project is yet to be determined.
+---
+
+## How-To: Setting Up the AI Backend (Arch Linux + RTX 3080)
+
+`tai` uses [Ollama](https://ollama.com) as its local AI backend. It exposes an OpenAI-compatible HTTP API that `tai` talks to — no cloud services, no data leaving your machine.
+
+An RTX 3080 (10 GB VRAM) comfortably runs 7–8B parameter models at 4-bit quantisation.
+
+### 1. Install CUDA and Ollama
+
+```bash
+# CUDA runtime (skip if already installed)
+sudo pacman -S cuda
+
+# Ollama with CUDA support from the AUR
+yay -S ollama-cuda
+# or: paru -S ollama-cuda
+
+# Enable and start the service
+sudo systemctl enable --now ollama
+```
+
+### 2. Pull a model
+
+```bash
+ollama pull gemma3:4b       # ~3 GB — fast, good for sysadmin tasks
+ollama pull llama3.1:8b     # ~5 GB — stronger reasoning
+ollama pull qwen2.5:7b      # ~4.5 GB — strong structured output
+```
+
+### 3. Verify the model works
+
+```bash
+ollama run gemma3:4b "what causes a systemd service to enter failed state?"
+```
+
+### 4. Verify the HTTP API is running
+
+`tai` communicates with Ollama over its OpenAI-compatible REST API:
+
+```bash
+curl http://localhost:11434/api/generate \
+  -d '{"model":"gemma3:4b","prompt":"hello","stream":false}'
+```
+
+A JSON response with a `response` field confirms everything is working.
+
+### 5. Point tai at your Ollama instance
+
+Once `tai` AI integration is complete, use these flags:
+
+```bash
+tai "nginx failing to start" --host web01 \
+  --ai-host http://localhost:11434 \
+  --model gemma3:4b
+```
+
+The default values for `--ai-host` and `--model` will be `http://localhost:11434` and `gemma3:4b` respectively, so for local use you won't need to specify them explicitly.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -15,6 +15,7 @@ dependencies = [
  "typer>=0.12,<1.0",
  "rich>=13.7,<14.0",
  "asyncssh>=2.14,<3.0",
+  "openai>=1.30,<2.0",
 ]

 [project.optional-dependencies]