feat: add history UX and expand retention-focused roadmap

2026-05-11 21:07:39 +02:00
parent 964aee3481
commit 7749a02706
5 changed files with 552 additions and 1 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -321,3 +321,194 @@ ______________________________________________________________________
 | 2026-05-04 | RAG vector store (Tier 2/3) | `chromadb` embedded mode (default) or `qdrant` self-hosted |
 | 2026-05-04 | RAG chunking unit | Command-boundary splitting — each collected command = one or more chunks |
 | 2026-05-04 | Runbook format | Markdown with YAML frontmatter, version-controlled in `runbooks/` directory |
+
+______________________________________________________________________
+
+## End-State UX Goal
+
+After the current CLI and memory roadmap phases are stable, the long-term UX goal is a full-screen terminal TUI with an ncurses-style workflow.
+
+### Target End-State
+
+- Split-pane troubleshooting workspace (diagnostics, AI output, and command/input area)
+- Live command/probe status with clear success/failure indicators
+- In-session history browser for prior questions, retrieved evidence, and related past sessions
+- Keyboard-first navigation for operators running in SSH-only environments
+
+### Delivery Approach
+
+- Keep shipping incremental CLI features first (current roadmap order remains unchanged)
+- Promote stable workflows into TUI panels once behavior is proven in CLI mode
+- Treat the TUI as a final UX consolidation milestone, not a blocker for core troubleshooting capabilities
+
+______________________________________________________________________
+
+## Container Distribution Goal (Docker)
+
+After core CLI/TUI workflows stabilize, provide an official Docker image as an additional distribution target.
+
+### Container Execution Model (Decision)
+
+- Docker is a one-shot invocation target, not a daemon/service mode
+- Each run executes a single `tai` command and exits
+- State is persisted only through mounted host volumes
+
+### Why Docker Is Valuable Here
+
+- Reproducible runtime: pin Python and dependency versions to remove host-level drift
+- Faster operator onboarding: run with one command instead of local Python setup
+- Cleaner CI/CD release path: publish versioned images aligned with git tags
+- Safer local footprint: isolate dependencies from the host OS package manager
+
+### Subgoals
+
+1. Base image and runtime hardening
+
+- Multi-stage Dockerfile with slim runtime image
+- Non-root runtime user and minimal filesystem permissions
+- Healthcheck for CLI startup and version command
+
+2. Runtime integration for SSH workflows
+
+- Documented mounts for `~/.ssh` (read-only where possible) and known-hosts handling
+- Pass-through for SSH config when needed (`--ignore-ssh-config` behavior documented)
+- Clear guidance for jump-host and bastion scenarios from inside the container
+- Documented one-shot run examples for `tai run` and `tai history`
+
+3. Persistent data strategy
+
+- Required volume mount guidance for runbook store (`~/.tai/runbooks`)
+- Required volume mount guidance for session memory/history (`~/.tai/sessions`)
+- Optional bind mount for JSONL logs and report export artifacts
+- Clear defaults for container paths and equivalent host path mappings
+
+4. Release and quality gates
+
+- Build and publish image on tagged releases
+- Smoke tests in CI: probe mode, collect mode, and history command against mocked endpoints
+- Version labeling (image tags and OCI metadata) tied to changelog/release tags
+
+### Data Retention and Lifecycle Policy
+
+Retention behavior must be explicit and configurable at runtime. Defaults should be conservative and documented.
+
+1. Retention classes
+
+- Session memory store (`~/.tai/sessions`): keep semantically indexed summaries for troubleshooting continuity
+- Runbook store (`~/.tai/runbooks`): retain until explicitly replaced or pruned by sync policy
+- JSONL logs and exported reports: operator-controlled retention with optional TTL cleanup
+
+2. Retention controls
+
+- Add CLI controls for age-based pruning (for example `--retain-days` on cleanup commands)
+- Add host-scoped cleanup (delete history for one host) and full-store cleanup (all hosts)
+- Add dry-run cleanup mode to show what would be deleted before applying changes
+
+3. No-persist mode
+
+- Add a documented ephemeral mode where no session memory or logs are written
+- Ensure one-shot diagnostics can run in read-only operational contexts
+
+### Configuration and State Persistence Model
+
+Configuration and retained state should be predictable across container upgrades and host environments.
+
+1. Mount and path contract
+
+- Define canonical container paths for `~/.tai/runbooks`, `~/.tai/sessions`, and optional log/export paths
+- Document required versus optional mounts and expected permissions for each
+- Document UID/GID mapping guidance to prevent host volume ownership issues
+
+2. Schema and compatibility
+
+- Introduce explicit storage schema version metadata for persistent stores
+- Define upgrade behavior for older stores (migrate, re-index, or fail with clear guidance)
+- Add compatibility notes for image upgrades and rollback expectations
+
+3. Backup and recovery
+
+- Provide export/import workflows for session memory and runbook indexes
+- Document minimal backup set and restore order for disaster recovery
+
+### Security and Privacy for Retained Data
+
+Persisted troubleshooting evidence can include sensitive operational data and must be handled accordingly.
+
+1. Data minimization
+
+- Add optional redaction hooks for common sensitive patterns before persistence
+- Keep prompt-only transient data separate from persisted summary/index content
+
+2. Runtime hardening
+
+- Target non-root container execution with read-only root filesystem by default
+- Require explicit writable mounts only for retained data locations
+
+3. Auditable behavior
+
+- Log retention-affecting operations (cleanup, purge, export/import) with timestamps and scope
+- Define stable exit codes for cleanup and retention workflows to support automation
+
+### Kubernetes Position
+
+Kubernetes is out of scope for this delivery plan.
+
+- `tai` is currently an operator-invoked troubleshooting client, not a long-running service
+- AI inference is external to `tai` (OpenAI-compatible endpoint), reducing the need for in-cluster model orchestration
+- SSH key/config handling and per-operator context are simpler with local or single-container execution
+
+Kubernetes can be revisited only if `tai` evolves into a centralized multi-user service with queueing, RBAC, and shared tenancy requirements.
+
+______________________________________________________________________
+
+## Final Long-Term Goal: Full Rust Migration
+
+This is a final-stage roadmap goal and remains explicitly out of near-term scope.
+It should begin only after the Python implementation, TUI direction, Docker one-shot model,
+and retention/persistence policies are stable and proven in production usage.
+
+### Why This Is the Final Goal
+
+- Improve execution latency and startup speed for both native runs and container one-shot invocations
+- Produce a single, portable native binary with minimal runtime dependency footprint
+- Strengthen reliability and memory safety under heavy log parsing and concurrent workflows
+- Simplify long-term packaging and distribution across Linux targets
+
+### Migration Objectives
+
+1. Preserve feature parity first
+
+- Match existing CLI behavior, interactive workflows, RAG integration, runbook management, and history/session-memory features
+- Keep command semantics and safety boundaries equivalent during transition
+
+2. Target both distribution modes
+
+- Native Rust binary for direct operator use
+- Docker image built around the Rust binary for one-shot execution with mounted persistent volumes
+
+3. Keep compatibility guardrails
+
+- Define persistent data format compatibility or migration tooling for runbook/session stores
+- Preserve operator-visible flags where practical to reduce migration friction
+
+### Suggested Delivery Phases
+
+1. Build baseline Rust CLI scaffold with feature-flagged parity checkpoints
+2. Port SSH execution and read-only policy enforcement modules
+3. Port planner, collectors, prompt composition, and AI client adapters
+4. Port session memory/history and runbook workflows with migration tests
+5. Port interactive UX/TUI layer and deprecate Python runtime path
+
+### Rust Toolchain End-State
+
+- Standardize on Cargo-based build/test/lint pipeline (`cargo fmt`, `cargo clippy`, `cargo test`)
+- Add release profile optimization and reproducible build settings
+- Publish signed native artifacts and Docker images derived from Rust release binaries
+
+### Decision Gate Before Starting
+
+Begin Rust migration only when:
+
+- Python roadmap milestones are complete and stable
+- Container distribution and retention policy workflows are operationally validated
+- A parity test matrix exists to prove behavior equivalence during migration