feat: add history UX and expand retention-focused roadmap
Some checks failed
CI / test (push) Failing after 15s

This commit is contained in:
2026-05-11 21:07:39 +02:00
parent 964aee3481
commit 7749a02706
5 changed files with 552 additions and 1 deletions

View File

@@ -321,3 +321,194 @@ ______________________________________________________________________
| 2026-05-04 | RAG vector store (Tier 2/3) | `chromadb` embedded mode (default) or `qdrant` self-hosted |
| 2026-05-04 | RAG chunking unit | Command-boundary splitting — each collected command = one or more chunks |
| 2026-05-04 | Runbook format | Markdown with YAML frontmatter, version-controlled in `runbooks/` directory |
______________________________________________________________________
## End-State UX Goal
After the current CLI and memory roadmap phases are stable, the long-term UX goal is a full-screen terminal TUI with an ncurses-style workflow.
### Target End-State
- Split-pane troubleshooting workspace (diagnostics, AI output, and command/input area)
- Live command/probe status with clear success/failure indicators
- In-session history browser for prior questions, retrieved evidence, and related past sessions
- Keyboard-first navigation for operators running in SSH-only environments
### Delivery Approach
- Keep shipping incremental CLI features first (current roadmap order remains unchanged)
- Promote stable workflows into TUI panels once behavior is proven in CLI mode
- Treat the TUI as a final UX consolidation milestone, not a blocker for core troubleshooting capabilities
______________________________________________________________________
## Container Distribution Goal (Docker)
After core CLI/TUI workflows stabilize, provide an official Docker image as an additional distribution target.
### Container Execution Model (Decision)
- Docker is a one-shot invocation target, not a daemon/service mode
- Each run executes a single `tai` command and exits
- State is persisted only through mounted host volumes
### Why Docker Is Valuable Here
- Reproducible runtime: pin Python and dependency versions to remove host-level drift
- Faster operator onboarding: run with one command instead of local Python setup
- Cleaner CI/CD release path: publish versioned images aligned with git tags
- Safer local footprint: isolate dependencies from the host OS package manager
### Subgoals
1. Base image and runtime hardening
- Multi-stage Dockerfile with slim runtime image
- Non-root runtime user and minimal filesystem permissions
- Healthcheck for CLI startup and version command
2. Runtime integration for SSH workflows
- Documented mounts for `~/.ssh` (read-only where possible) and known-hosts handling
- Pass-through for SSH config when needed (`--ignore-ssh-config` behavior documented)
- Clear guidance for jump-host and bastion scenarios from inside the container
- Documented one-shot run examples for `tai run` and `tai history`
3. Persistent data strategy
- Required volume mount guidance for runbook store (`~/.tai/runbooks`)
- Required volume mount guidance for session memory/history (`~/.tai/sessions`)
- Optional bind mount for JSONL logs and report export artifacts
- Clear defaults for container paths and equivalent host path mappings
4. Release and quality gates
- Build and publish image on tagged releases
- Smoke tests in CI: probe mode, collect mode, and history command against mocked endpoints
- Version labeling (image tags and OCI metadata) tied to changelog/release tags
### Data Retention and Lifecycle Policy
Retention behavior must be explicit and configurable at runtime. Defaults should be conservative and documented.
1. Retention classes
- Session memory store (`~/.tai/sessions`): keep semantically indexed summaries for troubleshooting continuity
- Runbook store (`~/.tai/runbooks`): retain until explicitly replaced or pruned by sync policy
- JSONL logs and exported reports: operator-controlled retention with optional TTL cleanup
2. Retention controls
- Add CLI controls for age-based pruning (for example `--retain-days` on cleanup commands)
- Add host-scoped cleanup (delete history for one host) and full-store cleanup (all hosts)
- Add dry-run cleanup mode to show what would be deleted before applying changes
3. No-persist mode
- Add a documented ephemeral mode where no session memory or logs are written
- Ensure one-shot diagnostics can run in read-only operational contexts
### Configuration and State Persistence Model
Configuration and retained state should be predictable across container upgrades and host environments.
1. Mount and path contract
- Define canonical container paths for `~/.tai/runbooks`, `~/.tai/sessions`, and optional log/export paths
- Document required versus optional mounts and expected permissions for each
- Document UID/GID mapping guidance to prevent host volume ownership issues
2. Schema and compatibility
- Introduce explicit storage schema version metadata for persistent stores
- Define upgrade behavior for older stores (migrate, re-index, or fail with clear guidance)
- Add compatibility notes for image upgrades and rollback expectations
3. Backup and recovery
- Provide export/import workflows for session memory and runbook indexes
- Document minimal backup set and restore order for disaster recovery
### Security and Privacy for Retained Data
Persisted troubleshooting evidence can include sensitive operational data and must be handled accordingly.
1. Data minimization
- Add optional redaction hooks for common sensitive patterns before persistence
- Keep prompt-only transient data separate from persisted summary/index content
2. Runtime hardening
- Target non-root container execution with read-only root filesystem by default
- Require explicit writable mounts only for retained data locations
3. Auditable behavior
- Log retention-affecting operations (cleanup, purge, export/import) with timestamps and scope
- Define stable exit codes for cleanup and retention workflows to support automation
### Kubernetes Position
Kubernetes is out of scope for this delivery plan.
- `tai` is currently an operator-invoked troubleshooting client, not a long-running service
- AI inference is external to `tai` (OpenAI-compatible endpoint), reducing the need for in-cluster model orchestration
- SSH key/config handling and per-operator context are simpler with local or single-container execution
Kubernetes can be revisited only if `tai` evolves into a centralized multi-user service with queueing, RBAC, and shared tenancy requirements.
______________________________________________________________________
## Final Long-Term Goal: Full Rust Migration
This is a final-stage roadmap goal and remains explicitly out of near-term scope.
It should begin only after the Python implementation, TUI direction, Docker one-shot model,
and retention/persistence policies are stable and proven in production usage.
### Why This Is the Final Goal
- Improve execution latency and startup speed for both native runs and container one-shot invocations
- Produce a single, portable native binary with minimal runtime dependency footprint
- Strengthen reliability and memory safety under heavy log parsing and concurrent workflows
- Simplify long-term packaging and distribution across Linux targets
### Migration Objectives
1. Preserve feature parity first
- Match existing CLI behavior, interactive workflows, RAG integration, runbook management, and history/session-memory features
- Keep command semantics and safety boundaries equivalent during transition
2. Target both distribution modes
- Native Rust binary for direct operator use
- Docker image built around the Rust binary for one-shot execution with mounted persistent volumes
3. Keep compatibility guardrails
- Define persistent data format compatibility or migration tooling for runbook/session stores
- Preserve operator-visible flags where practical to reduce migration friction
### Suggested Delivery Phases
1. Build baseline Rust CLI scaffold with feature-flagged parity checkpoints
2. Port SSH execution and read-only policy enforcement modules
3. Port planner, collectors, prompt composition, and AI client adapters
4. Port session memory/history and runbook workflows with migration tests
5. Port interactive UX/TUI layer and deprecate Python runtime path
### Rust Toolchain End-State
- Standardize on Cargo-based build/test/lint pipeline (`cargo fmt`, `cargo clippy`, `cargo test`)
- Add release profile optimization and reproducible build settings
- Publish signed native artifacts and Docker images derived from Rust release binaries
### Decision Gate Before Starting
Begin Rust migration only when:
- Python roadmap milestones are complete and stable
- Container distribution and retention policy workflows are operationally validated
- A parity test matrix exists to prove behavior equivalence during migration