Some checks failed
CI / test (push) Failing after 1s
Co-authored-by: Copilot <copilot@github.com>
5.7 KiB
5.7 KiB
Roadmap
This document outlines the major decisions, milestones, and development phases required to bring tai from concept to a working tool.
Phase 0 — Decisions & Prerequisites
These must be resolved before meaningful development can begin.
Language Selection
- Decision: Python
- Key factors: native vLLM integration, mature SSH libraries (
paramiko/asyncssh), strong text/log parsing, rapid development - Single binary distribution will be achieved via Nuitka (preferred for true compilation) or PyInstaller as a fallback
- Evaluate Nuitka vs PyInstaller for binary output quality and CI reproducibility
- Add binary build step to CI pipeline
AI Backend & Model
- Confirm use of vLLM as the inference backend
- Confirm
gemma4:a4bas the default model (or select an alternative) - Define minimum hardware requirements for running the model locally
- Decide whether the AI backend is bundled, self-hosted externally, or user-supplied
SSH Strategy
- Decision: keypair authentication only — no password auth; eliminates credential storage risk
- Default key resolution:
~/.ssh/id_ed25519,~/.ssh/id_rsa(in order of preference) - CLI override via
--identity-file <path> - No SSH agent forwarding needed — a shared key is distributed to all managed hosts via Puppet
- Default key resolution:
- Known hosts: auto-accept new hosts; reject on key mismatch — a changed host key triggers a hard stop with a MITM warning; unknown/new hosts are accepted silently on first connect
- Bastion/jump host:
--jump-host <host>flag — delegates to SSH's native ProxyJump functionality - SSH config behavior: respect existing
~/.ssh/configby default; allow CLI override- Default: follow host settings from
~/.ssh/config(forUser,Port,ProxyJump, etc.) - Override switch:
--ignore-ssh-configto bypass local SSH config when required
- Default: follow host settings from
Scope & Constraints
- Define the supported scope of issues (services, network, disk, kernel, etc.)
- Confirm read-only guarantee — document exactly what "read-only" means in practice
- Decision: interactive REPL mode for v0.1, full TUI for v0.2+
- v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
- v0.2+:
textual-based TUI with split panes (collected data | AI output | input bar) - Built-in slash commands:
/collect,/show logs,/clear,/host <hostname>,/help,/quit
Phase 1 — Project Foundation
Basic project scaffolding and connectivity.
- Finalise repository structure and language toolchain
- Set up CI pipeline (linting, tests)
- Implement SSH connection module
- Define SSH config model and probe interface scaffold
- Connect to remote host
- Execute read-only commands (e.g.
journalctl,systemctl status,cat) - Stream or collect command output safely
- Implement basic input parsing (ticket text, hostname, target directories)
- Write unit tests for SSH and input modules
- Input parser and CLI tests added
- SSH module tests added for command policy and SSH argv behavior
Phase 2 — Data Collection Layer
Define what information the agent gathers and how.
- Identify the canonical set of data sources per issue type:
- Service failures:
journalctl,systemctl, service config files - Network issues:
ip,ss,netstat, firewall rules - Disk issues:
df,du,dmesg,smartctl - General:
/var/log/syslog,/var/log/messages,dmesg
- Service failures:
- Implement pluggable "collector" modules per data source
- Implement directory traversal for user-specified paths (read-only)
- Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
- Write tests with mocked SSH output
Phase 3 — AI Integration
Wire collected data into the local AI model.
- Implement vLLM client module
- Design prompt template: system context, collected data, issue description → diagnosis
- Implement response parsing and structured output (root cause + suggested steps)
- Tune context window usage — handle truncation for large log outputs
- Add streaming support for long AI responses
- Evaluate and test model output quality on common issue types
Phase 4 — CLI & User Experience
Polish the interface for real-world use.
- Design CLI interface (flags, subcommands, interactive prompts)
- Implement structured output: diagnosis, confidence, recommended actions
- Add
--verbose/--debugmode showing raw collected data - Support output to file or clipboard
- Write man page /
--helpdocumentation
Phase 5 — Hardening & Distribution
Prepare for broader use.
- Security review of SSH handling and credential storage
- Ensure no data is written to the remote system under any path
- Package for distribution (binary release, container image, or distro packages)
- Write installation and quickstart documentation
- End-to-end integration tests against a test VM
Decisions Log
| Date | Decision | Outcome |
|---|---|---|
| 2026-05-04 | Implementation language | Python — with single distributable binary via Nuitka |
| — | AI inference backend | vLLM (provisional) |
| — | Default model | gemma4:a4b (provisional) |
| 2026-05-04 | SSH auth methods | Keypair only (ed25519/RSA); auto-accept new hosts; reject on key change (MITM) |
| 2026-05-04 | Bastion host support | --jump-host flag via SSH native ProxyJump |
| 2026-05-04 | SSH config behavior | Use ~/.ssh/config by default; allow override via --ignore-ssh-config |
| 2026-05-04 | CLI vs interactive mode | Interactive: REPL for v0.1, textual TUI for v0.2+ |