zphinx/tai

Files

CI / test (push) Successful in 14s

Details

update

Co-authored-by: Copilot <copilot@github.com>

2026-05-04 04:08:50 +02:00

6.2 KiB

Raw Blame History

Roadmap

This document outlines the major decisions, milestones, and development phases required to bring tai from concept to a working tool.

Phase 0 — Decisions & Prerequisites

These must be resolved before meaningful development can begin.

Language Selection

Decision: Python
Key factors: native vLLM integration, mature SSH libraries (paramiko / asyncssh), strong text/log parsing, rapid development
Single binary distribution will be achieved via Nuitka (preferred for true compilation) or PyInstaller as a fallback
Evaluate Nuitka vs PyInstaller for binary output quality and CI reproducibility
Add binary build step to CI pipeline

AI Backend & Model

Confirm use of vLLM as the inference backend
Confirm gemma4:a4b as the default model (or select an alternative)
Define minimum hardware requirements for running the model locally
Decide whether the AI backend is bundled, self-hosted externally, or user-supplied

SSH Strategy

Decision: keypair authentication only — no password auth; eliminates credential storage risk
- Default key resolution: ~/.ssh/id_ed25519, ~/.ssh/id_rsa (in order of preference)
- CLI override via --identity-file <path>
- No SSH agent forwarding needed — a shared key is distributed to all managed hosts via Puppet
Known hosts: auto-accept new hosts; reject on key mismatch — a changed host key triggers a hard stop with a MITM warning; unknown/new hosts are accepted silently on first connect
Bastion/jump host: --jump-host <host> flag — delegates to SSH's native ProxyJump functionality
SSH config behavior: respect existing ~/.ssh/config by default; allow CLI override
- Default: follow host settings from ~/.ssh/config (for User, Port, ProxyJump, etc.)
- Override switch: --ignore-ssh-config to bypass local SSH config when required

Scope & Constraints

Define the supported scope of issues (services, network, disk, kernel, etc.)
Confirm read-only guarantee — document exactly what "read-only" means in practice
Decision: interactive REPL mode for v0.1, full TUI for v0.2+
- v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
- v0.2+: textual-based TUI with split panes (collected data | AI output | input bar)
- Built-in slash commands: /collect, /show logs, /clear, /host <hostname>, /help, /quit

Phase 1 — Project Foundation

Basic project scaffolding and connectivity.

Finalise repository structure and language toolchain
Set up CI pipeline (linting, tests)
Implement SSH connection module
- Define SSH config model and probe interface scaffold
- Connect to remote host
- Execute read-only commands (e.g. journalctl, systemctl status, cat)
- Stream or collect command output safely
Implement basic input parsing (ticket text, hostname, target directories)
Write unit tests for SSH and input modules
- Input parser and CLI tests added
- SSH module tests added for command policy and SSH argv behavior

Phase 2 — Data Collection Layer

Define what information the agent gathers and how.

Identify the canonical set of data sources per issue type:
- Service failures: journalctl, systemctl, service config files
- Network issues: ip, ss, netstat, firewall rules
- Disk issues: df, du, dmesg, smartctl
- General: /var/log/syslog, /var/log/messages, dmesg
Implement pluggable "collector" modules per data source
Implement directory traversal for user-specified paths (read-only)
Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
Write tests with mocked SSH output

Phase 3 — AI Integration

Wire collected data into the local AI model.

Implement vLLM client module
Design prompt template: system context, collected data, issue description → diagnosis
Implement response parsing and structured output (root cause + suggested steps)
Tune context window usage — handle truncation for large log outputs
Add streaming support for long AI responses
Evaluate and test model output quality on common issue types

Phase 4 — CLI & User Experience

Polish the interface for real-world use.

Design CLI interface (flags, subcommands, interactive prompts)
Implement structured output: diagnosis, confidence, recommended actions
Add --verbose / --debug mode showing raw collected data
Support output to file or clipboard
Write man page / --help documentation

Phase 5 — Hardening & Distribution

Prepare for broader use.

Security review of SSH handling and credential storage
Ensure no data is written to the remote system under any path
Package for distribution (binary release, container image, or distro packages)
Write installation and quickstart documentation
End-to-end integration tests against a test VM

Decisions Log

Date	Decision	Outcome
2026-05-04	Implementation language	Python — with single distributable binary via Nuitka
—	AI inference backend	vLLM (provisional)
—	Default model	`gemma4:a4b` (provisional)
2026-05-04	SSH auth methods	Keypair only (ed25519/RSA); auto-accept new hosts; reject on key change (MITM)
2026-05-04	Bastion host support	`--jump-host` flag via SSH native ProxyJump
2026-05-04	SSH config behavior	Use `~/.ssh/config` by default; allow override via `--ignore-ssh-config`
2026-05-04	CLI vs interactive mode	Interactive: REPL for v0.1, `textual` TUI for v0.2+

6.2 KiB Raw Blame History