All checks were successful
CI / test (push) Successful in 14s
Co-authored-by: Copilot <copilot@github.com>
131 lines
6.2 KiB
Markdown
131 lines
6.2 KiB
Markdown
# Roadmap
|
|
|
|
This document outlines the major decisions, milestones, and development phases required to bring `tai` from concept to a working tool.
|
|
|
|
______________________________________________________________________
|
|
|
|
## Phase 0 — Decisions & Prerequisites
|
|
|
|
These must be resolved before meaningful development can begin.
|
|
|
|
### Language Selection
|
|
|
|
- [x] **Decision: Python**
|
|
- Key factors: native vLLM integration, mature SSH libraries (`paramiko` / `asyncssh`), strong text/log parsing, rapid development
|
|
- Single binary distribution will be achieved via **Nuitka** (preferred for true compilation) or **PyInstaller** as a fallback
|
|
- [ ] Evaluate Nuitka vs PyInstaller for binary output quality and CI reproducibility
|
|
- [ ] Add binary build step to CI pipeline
|
|
|
|
### AI Backend & Model
|
|
|
|
- [ ] Confirm use of [vLLM](https://github.com/vllm-project/vllm) as the inference backend
|
|
- [ ] Confirm `gemma4:a4b` as the default model (or select an alternative)
|
|
- [ ] Define minimum hardware requirements for running the model locally
|
|
- [ ] Decide whether the AI backend is bundled, self-hosted externally, or user-supplied
|
|
|
|
### SSH Strategy
|
|
|
|
- [x] **Decision: keypair authentication only** — no password auth; eliminates credential storage risk
|
|
- Default key resolution: `~/.ssh/id_ed25519`, `~/.ssh/id_rsa` (in order of preference)
|
|
- CLI override via `--identity-file <path>`
|
|
- No SSH agent forwarding needed — a shared key is distributed to all managed hosts via Puppet
|
|
- [x] **Known hosts: auto-accept new hosts; reject on key mismatch** — a changed host key triggers a hard stop with a MITM warning; unknown/new hosts are accepted silently on first connect
|
|
- [x] **Bastion/jump host: `--jump-host <host>` flag** — delegates to SSH's native ProxyJump functionality
|
|
- [x] **SSH config behavior: respect existing `~/.ssh/config` by default; allow CLI override**
|
|
- Default: follow host settings from `~/.ssh/config` (for `User`, `Port`, `ProxyJump`, etc.)
|
|
- Override switch: `--ignore-ssh-config` to bypass local SSH config when required
|
|
|
|
### Scope & Constraints
|
|
|
|
- [ ] Define the supported scope of issues (services, network, disk, kernel, etc.)
|
|
- [ ] Confirm read-only guarantee — document exactly what "read-only" means in practice
|
|
- [x] **Decision: interactive REPL mode for v0.1, full TUI for v0.2+**
|
|
- v0.1: chat-loop REPL launched from CLI; human can follow up, correct, and redirect the agent
|
|
- v0.2+: `textual`-based TUI with split panes (collected data | AI output | input bar)
|
|
- Built-in slash commands: `/collect`, `/show logs`, `/clear`, `/host <hostname>`, `/help`, `/quit`
|
|
|
|
______________________________________________________________________
|
|
|
|
## Phase 1 — Project Foundation
|
|
|
|
Basic project scaffolding and connectivity.
|
|
|
|
- [x] Finalise repository structure and language toolchain
|
|
- [x] Set up CI pipeline (linting, tests)
|
|
- [ ] Implement SSH connection module
|
|
- [x] Define SSH config model and probe interface scaffold
|
|
- [x] Connect to remote host
|
|
- [x] Execute read-only commands (e.g. `journalctl`, `systemctl status`, `cat`)
|
|
- [ ] Stream or collect command output safely
|
|
- [x] Implement basic input parsing (ticket text, hostname, target directories)
|
|
- [x] Write unit tests for SSH and input modules
|
|
- [x] Input parser and CLI tests added
|
|
- [x] SSH module tests added for command policy and SSH argv behavior
|
|
|
|
______________________________________________________________________
|
|
|
|
## Phase 2 — Data Collection Layer
|
|
|
|
Define what information the agent gathers and how.
|
|
|
|
- [ ] Identify the canonical set of data sources per issue type:
|
|
- Service failures: `journalctl`, `systemctl`, service config files
|
|
- Network issues: `ip`, `ss`, `netstat`, firewall rules
|
|
- Disk issues: `df`, `du`, `dmesg`, `smartctl`
|
|
- General: `/var/log/syslog`, `/var/log/messages`, `dmesg`
|
|
- [ ] Implement pluggable "collector" modules per data source
|
|
- [ ] Implement directory traversal for user-specified paths (read-only)
|
|
- [ ] Add support for per-distro variations (Ubuntu vs RHEL path differences, etc.)
|
|
- [ ] Write tests with mocked SSH output
|
|
|
|
______________________________________________________________________
|
|
|
|
## Phase 3 — AI Integration
|
|
|
|
Wire collected data into the local AI model.
|
|
|
|
- [ ] Implement vLLM client module
|
|
- [ ] Design prompt template: system context, collected data, issue description → diagnosis
|
|
- [ ] Implement response parsing and structured output (root cause + suggested steps)
|
|
- [ ] Tune context window usage — handle truncation for large log outputs
|
|
- [ ] Add streaming support for long AI responses
|
|
- [ ] Evaluate and test model output quality on common issue types
|
|
|
|
______________________________________________________________________
|
|
|
|
## Phase 4 — CLI & User Experience
|
|
|
|
Polish the interface for real-world use.
|
|
|
|
- [ ] Design CLI interface (flags, subcommands, interactive prompts)
|
|
- [ ] Implement structured output: diagnosis, confidence, recommended actions
|
|
- [ ] Add `--verbose` / `--debug` mode showing raw collected data
|
|
- [ ] Support output to file or clipboard
|
|
- [ ] Write man page / `--help` documentation
|
|
|
|
______________________________________________________________________
|
|
|
|
## Phase 5 — Hardening & Distribution
|
|
|
|
Prepare for broader use.
|
|
|
|
- [ ] Security review of SSH handling and credential storage
|
|
- [ ] Ensure no data is written to the remote system under any path
|
|
- [ ] Package for distribution (binary release, container image, or distro packages)
|
|
- [ ] Write installation and quickstart documentation
|
|
- [ ] End-to-end integration tests against a test VM
|
|
|
|
______________________________________________________________________
|
|
|
|
## Decisions Log
|
|
|
|
| Date | Decision | Outcome |
|
|
|------|----------|---------|
|
|
| 2026-05-04 | Implementation language | Python — with single distributable binary via Nuitka |
|
|
| — | AI inference backend | vLLM (provisional) |
|
|
| — | Default model | `gemma4:a4b` (provisional) |
|
|
| 2026-05-04 | SSH auth methods | Keypair only (ed25519/RSA); auto-accept new hosts; reject on key change (MITM) |
|
|
| 2026-05-04 | Bastion host support | `--jump-host` flag via SSH native ProxyJump |
|
|
| 2026-05-04 | SSH config behavior | Use `~/.ssh/config` by default; allow override via `--ignore-ssh-config` |
|
|
| 2026-05-04 | CLI vs interactive mode | Interactive: REPL for v0.1, `textual` TUI for v0.2+ |
|