feat: add history UX and expand retention-focused roadmap
Some checks failed
CI / test (push) Failing after 15s
Some checks failed
CI / test (push) Failing after 15s
This commit is contained in:
191
ROADMAP.md
191
ROADMAP.md
@@ -321,3 +321,194 @@ ______________________________________________________________________
|
||||
| 2026-05-04 | RAG vector store (Tier 2/3) | `chromadb` embedded mode (default) or `qdrant` self-hosted |
|
||||
| 2026-05-04 | RAG chunking unit | Command-boundary splitting — each collected command = one or more chunks |
|
||||
| 2026-05-04 | Runbook format | Markdown with YAML frontmatter, version-controlled in `runbooks/` directory |
|
||||
|
||||
______________________________________________________________________
|
||||
|
||||
## End-State UX Goal
|
||||
|
||||
After the current CLI and memory roadmap phases are stable, the long-term UX goal is a full-screen terminal TUI with an ncurses-style workflow.
|
||||
|
||||
### Target End-State
|
||||
|
||||
- Split-pane troubleshooting workspace (diagnostics, AI output, and command/input area)
|
||||
- Live command/probe status with clear success/failure indicators
|
||||
- In-session history browser for prior questions, retrieved evidence, and related past sessions
|
||||
- Keyboard-first navigation for operators running in SSH-only environments
|
||||
|
||||
### Delivery Approach
|
||||
|
||||
- Keep shipping incremental CLI features first (current roadmap order remains unchanged)
|
||||
- Promote stable workflows into TUI panels once behavior is proven in CLI mode
|
||||
- Treat the TUI as a final UX consolidation milestone, not a blocker for core troubleshooting capabilities
|
||||
|
||||
______________________________________________________________________
|
||||
|
||||
## Container Distribution Goal (Docker)
|
||||
|
||||
After core CLI/TUI workflows stabilize, provide an official Docker image as an additional distribution target.
|
||||
|
||||
### Container Execution Model (Decision)
|
||||
|
||||
- Docker is a one-shot invocation target, not a daemon/service mode
|
||||
- Each run executes a single `tai` command and exits
|
||||
- State is persisted only through mounted host volumes
|
||||
|
||||
### Why Docker Is Valuable Here
|
||||
|
||||
- Reproducible runtime: pin Python and dependency versions to remove host-level drift
|
||||
- Faster operator onboarding: run with one command instead of local Python setup
|
||||
- Cleaner CI/CD release path: publish versioned images aligned with git tags
|
||||
- Safer local footprint: isolate dependencies from the host OS package manager
|
||||
|
||||
### Subgoals
|
||||
|
||||
1. Base image and runtime hardening
|
||||
|
||||
- Multi-stage Dockerfile with slim runtime image
|
||||
- Non-root runtime user and minimal filesystem permissions
|
||||
- Healthcheck for CLI startup and version command
|
||||
|
||||
2. Runtime integration for SSH workflows
|
||||
|
||||
- Documented mounts for `~/.ssh` (read-only where possible) and known-hosts handling
|
||||
- Pass-through for SSH config when needed (`--ignore-ssh-config` behavior documented)
|
||||
- Clear guidance for jump-host and bastion scenarios from inside the container
|
||||
- Documented one-shot run examples for `tai run` and `tai history`
|
||||
|
||||
3. Persistent data strategy
|
||||
|
||||
- Required volume mount guidance for runbook store (`~/.tai/runbooks`)
|
||||
- Required volume mount guidance for session memory/history (`~/.tai/sessions`)
|
||||
- Optional bind mount for JSONL logs and report export artifacts
|
||||
- Clear defaults for container paths and equivalent host path mappings
|
||||
|
||||
4. Release and quality gates
|
||||
|
||||
- Build and publish image on tagged releases
|
||||
- Smoke tests in CI: probe mode, collect mode, and history command against mocked endpoints
|
||||
- Version labeling (image tags and OCI metadata) tied to changelog/release tags
|
||||
|
||||
### Data Retention and Lifecycle Policy
|
||||
|
||||
Retention behavior must be explicit and configurable at runtime. Defaults should be conservative and documented.
|
||||
|
||||
1. Retention classes
|
||||
|
||||
- Session memory store (`~/.tai/sessions`): keep semantically indexed summaries for troubleshooting continuity
|
||||
- Runbook store (`~/.tai/runbooks`): retain until explicitly replaced or pruned by sync policy
|
||||
- JSONL logs and exported reports: operator-controlled retention with optional TTL cleanup
|
||||
|
||||
2. Retention controls
|
||||
|
||||
- Add CLI controls for age-based pruning (for example `--retain-days` on cleanup commands)
|
||||
- Add host-scoped cleanup (delete history for one host) and full-store cleanup (all hosts)
|
||||
- Add dry-run cleanup mode to show what would be deleted before applying changes
|
||||
|
||||
3. No-persist mode
|
||||
|
||||
- Add a documented ephemeral mode where no session memory or logs are written
|
||||
- Ensure one-shot diagnostics can run in read-only operational contexts
|
||||
|
||||
### Configuration and State Persistence Model
|
||||
|
||||
Configuration and retained state should be predictable across container upgrades and host environments.
|
||||
|
||||
1. Mount and path contract
|
||||
|
||||
- Define canonical container paths for `~/.tai/runbooks`, `~/.tai/sessions`, and optional log/export paths
|
||||
- Document required versus optional mounts and expected permissions for each
|
||||
- Document UID/GID mapping guidance to prevent host volume ownership issues
|
||||
|
||||
2. Schema and compatibility
|
||||
|
||||
- Introduce explicit storage schema version metadata for persistent stores
|
||||
- Define upgrade behavior for older stores (migrate, re-index, or fail with clear guidance)
|
||||
- Add compatibility notes for image upgrades and rollback expectations
|
||||
|
||||
3. Backup and recovery
|
||||
|
||||
- Provide export/import workflows for session memory and runbook indexes
|
||||
- Document minimal backup set and restore order for disaster recovery
|
||||
|
||||
### Security and Privacy for Retained Data
|
||||
|
||||
Persisted troubleshooting evidence can include sensitive operational data and must be handled accordingly.
|
||||
|
||||
1. Data minimization
|
||||
|
||||
- Add optional redaction hooks for common sensitive patterns before persistence
|
||||
- Keep prompt-only transient data separate from persisted summary/index content
|
||||
|
||||
2. Runtime hardening
|
||||
|
||||
- Target non-root container execution with read-only root filesystem by default
|
||||
- Require explicit writable mounts only for retained data locations
|
||||
|
||||
3. Auditable behavior
|
||||
|
||||
- Log retention-affecting operations (cleanup, purge, export/import) with timestamps and scope
|
||||
- Define stable exit codes for cleanup and retention workflows to support automation
|
||||
|
||||
### Kubernetes Position
|
||||
|
||||
Kubernetes is out of scope for this delivery plan.
|
||||
|
||||
- `tai` is currently an operator-invoked troubleshooting client, not a long-running service
|
||||
- AI inference is external to `tai` (OpenAI-compatible endpoint), reducing the need for in-cluster model orchestration
|
||||
- SSH key/config handling and per-operator context are simpler with local or single-container execution
|
||||
|
||||
Kubernetes can be revisited only if `tai` evolves into a centralized multi-user service with queueing, RBAC, and shared tenancy requirements.
|
||||
|
||||
______________________________________________________________________
|
||||
|
||||
## Final Long-Term Goal: Full Rust Migration
|
||||
|
||||
This is a final-stage roadmap goal and remains explicitly out of near-term scope.
|
||||
It should begin only after the Python implementation, TUI direction, Docker one-shot model,
|
||||
and retention/persistence policies are stable and proven in production usage.
|
||||
|
||||
### Why This Is the Final Goal
|
||||
|
||||
- Improve execution latency and startup speed for both native runs and container one-shot invocations
|
||||
- Produce a single, portable native binary with minimal runtime dependency footprint
|
||||
- Strengthen reliability and memory safety under heavy log parsing and concurrent workflows
|
||||
- Simplify long-term packaging and distribution across Linux targets
|
||||
|
||||
### Migration Objectives
|
||||
|
||||
1. Preserve feature parity first
|
||||
|
||||
- Match existing CLI behavior, interactive workflows, RAG integration, runbook management, and history/session-memory features
|
||||
- Keep command semantics and safety boundaries equivalent during transition
|
||||
|
||||
2. Target both distribution modes
|
||||
|
||||
- Native Rust binary for direct operator use
|
||||
- Docker image built around the Rust binary for one-shot execution with mounted persistent volumes
|
||||
|
||||
3. Keep compatibility guardrails
|
||||
|
||||
- Define persistent data format compatibility or migration tooling for runbook/session stores
|
||||
- Preserve operator-visible flags where practical to reduce migration friction
|
||||
|
||||
### Suggested Delivery Phases
|
||||
|
||||
1. Build baseline Rust CLI scaffold with feature-flagged parity checkpoints
|
||||
2. Port SSH execution and read-only policy enforcement modules
|
||||
3. Port planner, collectors, prompt composition, and AI client adapters
|
||||
4. Port session memory/history and runbook workflows with migration tests
|
||||
5. Port interactive UX/TUI layer and deprecate Python runtime path
|
||||
|
||||
### Rust Toolchain End-State
|
||||
|
||||
- Standardize on Cargo-based build/test/lint pipeline (`cargo fmt`, `cargo clippy`, `cargo test`)
|
||||
- Add release profile optimization and reproducible build settings
|
||||
- Publish signed native artifacts and Docker images derived from Rust release binaries
|
||||
|
||||
### Decision Gate Before Starting
|
||||
|
||||
Begin Rust migration only when:
|
||||
|
||||
- Python roadmap milestones are complete and stable
|
||||
- Container distribution and retention policy workflows are operationally validated
|
||||
- A parity test matrix exists to prove behavior equivalence during migration
|
||||
|
||||
Reference in New Issue
Block a user