Files
tai/ROADMAP.md
zphinx 65c74dde5a
All checks were successful
CI / test (push) Successful in 14s
update
Co-authored-by: Copilot <copilot@github.com>
2026-05-04 04:08:50 +02:00

6.2 KiB

Roadmap

This document outlines the major decisions, milestones, and development phases required to bring tai from concept to a working tool.


Phase 0 — Decisions & Prerequisites

These must be resolved before meaningful development can begin.

Language Selection

  • Decision: Python
  • Key factors: native vLLM integration, mature SSH libraries (paramiko / asyncssh), strong text/log parsing, rapid development
  • Single binary distribution will be achieved via Nuitka (preferred for true compilation) or PyInstaller as a fallback
  • Evaluate Nuitka vs PyInstaller for binary output quality and CI reproducibility
  • Add binary build step to CI pipeline

AI Backend & Model

  • Confirm use of vLLM as the inference backend
  • Confirm gemma4:a4b as the default model (or select an alternative)
  • Define minimum hardware requirements for running the model locally
  • Decide whether the AI backend is bundled, self-hosted externally, or user-supplied

SSH Strategy

  • Decision: keypair authentication only — no password auth; eliminates credential storage risk
    • Default key resolution: ~/.ssh/id_ed25519, ~/.ssh/id_rsa (in order of preference)
    • CLI override via --identity-file <path>
    • No SSH agent forwarding needed — a shared key is distributed to all managed hosts via Puppet
  • Known hosts: auto-accept new hosts; reject on key mismatch — a changed host key triggers a hard stop with a MITM warning; unknown/new hosts are accepted silently on first connect
  • Bastion/jump host: --jump-host <host> flag — delegates to SSH's native ProxyJump functionality
  • SSH config behavior: respect existing ~/.ssh/config by default; allow CLI override
    • Default: follow host settings from ~/.ssh/config (for User, Port, ProxyJump, etc.)
    • Override switch: --ignore-ssh-config to bypass local SSH config when required

Scope & Constraints

  • Define the supported scope of issues (services, network, disk, kernel, etc.)
  • Confirm read-only guarantee — document exactly what "read-only" means in practice
  • Decision: interactive REPL mode for v0.1, full TUI for v0.2+
    • v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
    • v0.2+: textual-based TUI with split panes (collected data | AI output | input bar)
    • Built-in slash commands: /collect, /show logs, /clear, /host <hostname>, /help, /quit

Phase 1 — Project Foundation

Basic project scaffolding and connectivity.

  • Finalise repository structure and language toolchain
  • Set up CI pipeline (linting, tests)
  • Implement SSH connection module
    • Define SSH config model and probe interface scaffold
    • Connect to remote host
    • Execute read-only commands (e.g. journalctl, systemctl status, cat)
    • Stream or collect command output safely
  • Implement basic input parsing (ticket text, hostname, target directories)
  • Write unit tests for SSH and input modules
    • Input parser and CLI tests added
    • SSH module tests added for command policy and SSH argv behavior

Phase 2 — Data Collection Layer

Define what information the agent gathers and how.

  • Identify the canonical set of data sources per issue type:
    • Service failures: journalctl, systemctl, service config files
    • Network issues: ip, ss, netstat, firewall rules
    • Disk issues: df, du, dmesg, smartctl
    • General: /var/log/syslog, /var/log/messages, dmesg
  • Implement pluggable "collector" modules per data source
  • Implement directory traversal for user-specified paths (read-only)
  • Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
  • Write tests with mocked SSH output

Phase 3 — AI Integration

Wire collected data into the local AI model.

  • Implement vLLM client module
  • Design prompt template: system context, collected data, issue description → diagnosis
  • Implement response parsing and structured output (root cause + suggested steps)
  • Tune context window usage — handle truncation for large log outputs
  • Add streaming support for long AI responses
  • Evaluate and test model output quality on common issue types

Phase 4 — CLI & User Experience

Polish the interface for real-world use.

  • Design CLI interface (flags, subcommands, interactive prompts)
  • Implement structured output: diagnosis, confidence, recommended actions
  • Add --verbose / --debug mode showing raw collected data
  • Support output to file or clipboard
  • Write man page / --help documentation

Phase 5 — Hardening & Distribution

Prepare for broader use.

  • Security review of SSH handling and credential storage
  • Ensure no data is written to the remote system under any path
  • Package for distribution (binary release, container image, or distro packages)
  • Write installation and quickstart documentation
  • End-to-end integration tests against a test VM

Decisions Log

Date Decision Outcome
2026-05-04 Implementation language Python — with single distributable binary via Nuitka
AI inference backend vLLM (provisional)
Default model gemma4:a4b (provisional)
2026-05-04 SSH auth methods Keypair only (ed25519/RSA); auto-accept new hosts; reject on key change (MITM)
2026-05-04 Bastion host support --jump-host flag via SSH native ProxyJump
2026-05-04 SSH config behavior Use ~/.ssh/config by default; allow override via --ignore-ssh-config
2026-05-04 CLI vs interactive mode Interactive: REPL for v0.1, textual TUI for v0.2+