- Add --rag-debug flag to show retrieved chunk names and similarity scores - Add explicit fallback notices when RAG indexing/query embedding fails - Log RAG index/query metrics (duration, scores, top hit, token estimate) - Normalize and cap chunk content for more stable prompt shape on small models - Add hypothesis-continuity instruction for follow-up prompts - Add retrieval scoring API and new tests for truncation/fallback/debug paths
tai — Linux AI Troubleshooting Agent
tai is an agentic AI-driven troubleshooting tool for Linux systems. It autonomously investigates issues on remote hosts via SSH, analyzes relevant logs and configuration files, and provides a clear diagnosis along with suggested remediation steps — all without making any changes to the target system.
Overview
Given a problem description and a target hostname, tai connects to the remote system over SSH, gathers relevant data (logs, configuration files, service status, etc.), and uses a locally-hosted AI model to reason about the root cause and recommend solutions.
The agent operates in read-only mode at all times. It will never modify the target system under any circumstances — all suggestions are presented to the human troubleshooter for review and action.
Supported Distributions
- Ubuntu
- Debian
- RHEL
- Rocky Linux
Example Workflow
A troubleshooter receives a ticket reporting that the Apache service on a remote server has failed to start. They provide tai with:
- The ticket description or error message
- The hostname of the affected system
- Any relevant directories to focus on
tai then connects to the host, reads through system logs, service configurations, and any other related files, and returns a structured analysis of the likely cause along with recommended next steps.
Suggested Tooling
| Component | Tool |
|---|---|
| AI inference backend | Ollama |
| Chat model | gemma3:4b, llama3.1:8b, or qwen2.5:7b |
| Embedding model | nomic-embed-text (via Ollama) |
| Vector store | ChromaDB (embedded, local) |
| Language | Python 3.11+ |
How-To: Setting Up the AI Backend (Arch Linux + RTX 3080)
tai uses Ollama as its local AI backend. It exposes an OpenAI-compatible HTTP API that tai talks to — no cloud services, no data leaving your machine.
An RTX 3080 (10 GB VRAM) comfortably runs 7–8B parameter models at 4-bit quantisation.
1. Install CUDA and Ollama
# CUDA runtime (skip if already installed)
sudo pacman -S cuda
# Ollama with CUDA support from the AUR
yay -S ollama-cuda
# or: paru -S ollama-cuda
# Enable and start the service
sudo systemctl enable --now ollama
2. Pull a chat model
ollama pull gemma3:4b # ~3 GB — fast, good for sysadmin tasks
ollama pull llama3.1:8b # ~5 GB — stronger reasoning
ollama pull qwen2.5:7b # ~4.5 GB — strong structured output
3. Pull the embedding model
tai uses nomic-embed-text to embed diagnostic data and runbooks for semantic retrieval (RAG). Pull it on the same host as Ollama:
ollama pull nomic-embed-text # ~274 MB
Verify it loaded:
curl http://localhost:11434/api/embeddings \
-d '{"model":"nomic-embed-text","prompt":"test"}'
A JSON response with an "embedding" array confirms it is ready.
4. Verify the chat model works
ollama run gemma3:4b "what causes a systemd service to enter failed state?"
5. Verify the HTTP API is running
tai communicates with Ollama over its OpenAI-compatible REST API:
curl http://localhost:11434/api/generate \
-d '{"model":"gemma3:4b","prompt":"hello","stream":false}'
A JSON response with a response field confirms everything is working.
6. Point tai at your Ollama instance
Once tai AI integration is complete, use these flags:
tai "nginx failing to start" --host web01 \
--ai-host http://localhost:11434 \
--model gemma3:4b
The default values for --ai-host and --model will be http://localhost:11434 and gemma3:4b respectively, so for local use you won't need to specify them explicitly.