docs(02-01): complete config backup primitives plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
104
.planning/REQUIREMENTS.md
Normal file
104
.planning/REQUIREMENTS.md
Normal file
@@ -0,0 +1,104 @@
|
||||
# Requirements: RouterOS Config Backup & Change Tracking
|
||||
|
||||
**Defined:** 2026-03-12
|
||||
**Core Value:** Operators can see exactly what changed on a router and when, with reliable config snapshots for download
|
||||
|
||||
## v1 Requirements
|
||||
|
||||
### Collection
|
||||
|
||||
- [x] **COLL-01**: Poller collects RouterOS config via SSH `/export show-sensitive` on a configurable interval (default 6h)
|
||||
- [x] **COLL-02**: Poller normalizes config output (trim whitespace, normalize line endings, remove timestamp headers)
|
||||
- [ ] **COLL-03**: Poller sends config snapshot to API via NATS subject `config.snapshot.create`
|
||||
- [ ] **COLL-04**: Manual backup trigger via POST `/api/tenants/{tenant_id}/devices/{device_id}/backup`
|
||||
- [ ] **COLL-05**: Unreachable routers log warning and retry next interval
|
||||
- [x] **COLL-06**: Collection interval configurable via `CONFIG_BACKUP_INTERVAL` environment variable
|
||||
|
||||
### Storage
|
||||
|
||||
- [x] **STOR-01**: API stores config snapshots in `router_config_snapshots` table with SHA256 hash
|
||||
- [ ] **STOR-02**: Duplicate snapshots (same hash as previous) are skipped, no diff generated
|
||||
- [ ] **STOR-03**: Snapshots retained for 90 days (configurable via `CONFIG_RETENTION_DAYS`)
|
||||
- [ ] **STOR-04**: Older snapshots automatically deleted by retention cleanup
|
||||
- [x] **STOR-05**: Snapshots encrypted at rest, accessible only through RBAC
|
||||
|
||||
### Diff & Parsing
|
||||
|
||||
- [ ] **DIFF-01**: Unified diff generated when new snapshot differs from previous
|
||||
- [ ] **DIFF-02**: Diffs stored in `router_config_diffs` table linking snapshot pairs
|
||||
- [ ] **DIFF-03**: Structured change parser extracts component, summary, and raw line as JSON
|
||||
- [ ] **DIFF-04**: Parsed changes stored in `router_config_changes` table
|
||||
|
||||
### API
|
||||
|
||||
- [ ] **API-01**: GET `/api/tenants/{tid}/devices/{did}/config-history` returns change timeline
|
||||
- [ ] **API-02**: GET `/api/tenants/{tid}/devices/{did}/config/{snapshot_id}` returns full snapshot
|
||||
- [ ] **API-03**: GET `/api/tenants/{tid}/devices/{did}/config/{snapshot_id}/diff` returns unified diff
|
||||
- [ ] **API-04**: RBAC enforced: operator+ can trigger backups, viewers can read history
|
||||
|
||||
### Frontend
|
||||
|
||||
- [ ] **UI-01**: Device page shows Configuration History section below Remote Access
|
||||
- [ ] **UI-02**: Timeline displays change entries with component, summary, and timestamp
|
||||
- [ ] **UI-03**: Diff viewer shows unified diff with add/remove highlighting
|
||||
- [ ] **UI-04**: User can download snapshot as `router-{device_name}-{timestamp}.rsc`
|
||||
|
||||
### Observability
|
||||
|
||||
- [ ] **OBS-01**: Audit events logged: `config_snapshot_created`, `config_snapshot_skipped_duplicate`
|
||||
- [ ] **OBS-02**: Audit events logged: `config_diff_generated`, `config_backup_manual_trigger`
|
||||
|
||||
## v2 Requirements
|
||||
|
||||
### Restore
|
||||
|
||||
- **REST-01**: User can restore a config snapshot to a router via SSH
|
||||
- **REST-02**: Restore confirmation dialog with diff preview
|
||||
|
||||
## Out of Scope
|
||||
|
||||
| Feature | Reason |
|
||||
|---------|--------|
|
||||
| Config restore | Explicitly deferred per v9.6 spec |
|
||||
| Non-RouterOS device backup | Spec scopes to RouterOS only initially |
|
||||
| Real-time change detection | Polling-based by design, not event-driven |
|
||||
| Config comparison between arbitrary snapshots | Only consecutive snapshot diffs in v1 |
|
||||
|
||||
## Traceability
|
||||
|
||||
| Requirement | Phase | Status |
|
||||
|-------------|-------|--------|
|
||||
| COLL-01 | Phase 2: Poller Config Collection | Complete |
|
||||
| COLL-02 | Phase 2: Poller Config Collection | Complete |
|
||||
| COLL-03 | Phase 2: Poller Config Collection | Pending |
|
||||
| COLL-04 | Phase 4: Manual Backup Trigger | Pending |
|
||||
| COLL-05 | Phase 2: Poller Config Collection | Pending |
|
||||
| COLL-06 | Phase 2: Poller Config Collection | Complete |
|
||||
| STOR-01 | Phase 1: Database Schema | Complete |
|
||||
| STOR-02 | Phase 3: Snapshot Ingestion | Pending |
|
||||
| STOR-03 | Phase 9: Retention & Cleanup | Pending |
|
||||
| STOR-04 | Phase 9: Retention & Cleanup | Pending |
|
||||
| STOR-05 | Phase 1: Database Schema | Complete |
|
||||
| DIFF-01 | Phase 5: Diff Engine | Pending |
|
||||
| DIFF-02 | Phase 5: Diff Engine | Pending |
|
||||
| DIFF-03 | Phase 5: Diff Engine | Pending |
|
||||
| DIFF-04 | Phase 5: Diff Engine | Pending |
|
||||
| API-01 | Phase 6: History API | Pending |
|
||||
| API-02 | Phase 6: History API | Pending |
|
||||
| API-03 | Phase 6: History API | Pending |
|
||||
| API-04 | Phase 6: History API | Pending |
|
||||
| UI-01 | Phase 7: Config History UI | Pending |
|
||||
| UI-02 | Phase 7: Config History UI | Pending |
|
||||
| UI-03 | Phase 8: Diff Viewer & Download | Pending |
|
||||
| UI-04 | Phase 8: Diff Viewer & Download | Pending |
|
||||
| OBS-01 | Phase 10: Audit & Observability | Pending |
|
||||
| OBS-02 | Phase 10: Audit & Observability | Pending |
|
||||
|
||||
**Coverage:**
|
||||
- v1 requirements: 25 total
|
||||
- Mapped to phases: 25
|
||||
- Unmapped: 0
|
||||
|
||||
---
|
||||
*Requirements defined: 2026-03-12*
|
||||
*Last updated: 2026-03-12 after roadmap creation*
|
||||
81
.planning/STATE.md
Normal file
81
.planning/STATE.md
Normal file
@@ -0,0 +1,81 @@
|
||||
---
|
||||
gsd_state_version: 1.0
|
||||
milestone: v9.6
|
||||
milestone_name: milestone
|
||||
status: in_progress
|
||||
stopped_at: Completed 02-01-PLAN.md
|
||||
last_updated: "2026-03-13T01:49:00Z"
|
||||
last_activity: 2026-03-13 -- Completed 02-01 config backup primitives (SSH executor, normalizer, NATS event, migration)
|
||||
progress:
|
||||
total_phases: 10
|
||||
completed_phases: 1
|
||||
total_plans: 3
|
||||
completed_plans: 2
|
||||
percent: 100
|
||||
---
|
||||
|
||||
# Project State
|
||||
|
||||
## Project Reference
|
||||
|
||||
See: .planning/PROJECT.md (updated 2026-03-12)
|
||||
|
||||
**Core value:** Operators can see exactly what changed on a router and when, with reliable config snapshots for download
|
||||
**Current focus:** Phase 2: Poller Config Collection
|
||||
|
||||
## Current Position
|
||||
|
||||
Phase: 2 of 10 (Poller Config Collection)
|
||||
Plan: 1 of 2 in current phase (02-01 complete)
|
||||
Status: Phase 2 in progress
|
||||
Last activity: 2026-03-13 -- Completed 02-01 config backup primitives (SSH executor, normalizer, NATS event, migration)
|
||||
|
||||
Progress: [███████░░░] 67%
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
**Velocity:**
|
||||
- Total plans completed: 2
|
||||
- Average duration: 4min
|
||||
- Total execution time: 0.13 hours
|
||||
|
||||
**By Phase:**
|
||||
|
||||
| Phase | Plans | Total | Avg/Plan |
|
||||
|-------|-------|-------|----------|
|
||||
| 01-database-schema | 1 | 3min | 3min |
|
||||
| 02-poller-config-collection | 1 | 5min | 5min |
|
||||
|
||||
**Recent Trend:**
|
||||
- Last 5 plans: none
|
||||
- Trend: N/A
|
||||
|
||||
*Updated after each plan completion*
|
||||
|
||||
## Accumulated Context
|
||||
|
||||
### Decisions
|
||||
|
||||
Decisions are logged in PROJECT.md Key Decisions table.
|
||||
Recent decisions affecting current work:
|
||||
|
||||
- [01-01] Models added to existing config_backup.py (same domain, consistent pattern)
|
||||
- [01-01] config_text stores Transit ciphertext (vault:v1:...), plaintext never in DB
|
||||
- [01-01] sha256_hash is of plaintext config for deduplication without decryption
|
||||
- [02-01] TOFU fingerprint format matches ssh-keygen: SHA256:base64(sha256(pubkey))
|
||||
- [02-01] NormalizationVersion=1 constant in NATS payloads for future re-processing
|
||||
- [02-01] UpdateSSHHostKey uses COALESCE on first_seen to preserve original observation time
|
||||
|
||||
### Pending Todos
|
||||
|
||||
None yet.
|
||||
|
||||
### Blockers/Concerns
|
||||
|
||||
- OpenBao dev instance loses Transit keys on data wipe -- device creds need re-entry (from project memory, may affect snapshot encryption testing)
|
||||
|
||||
## Session Continuity
|
||||
|
||||
Last session: 2026-03-13T01:49:00Z
|
||||
Stopped at: Completed 02-01-PLAN.md
|
||||
Resume file: .planning/phases/02-poller-config-collection/02-02-PLAN.md
|
||||
128
.planning/phases/02-poller-config-collection/02-01-SUMMARY.md
Normal file
128
.planning/phases/02-poller-config-collection/02-01-SUMMARY.md
Normal file
@@ -0,0 +1,128 @@
|
||||
---
|
||||
phase: 02-poller-config-collection
|
||||
plan: 01
|
||||
subsystem: poller
|
||||
tags: [ssh, tofu, routeros, config-normalization, sha256, nats, prometheus, alembic]
|
||||
|
||||
requires:
|
||||
- phase: 01-database-schema
|
||||
provides: router_config_snapshots table for storing backup data
|
||||
provides:
|
||||
- SSH command executor with TOFU host key verification and typed error classification
|
||||
- Config normalizer with deterministic SHA256 hashing
|
||||
- ConfigSnapshotEvent NATS event type and PublishConfigSnapshot method
|
||||
- Config backup environment variables (interval, concurrency, timeout)
|
||||
- Device model SSH fields (port, host key fingerprint) with UpdateSSHHostKey method
|
||||
- Alembic migration 028 for devices table SSH columns
|
||||
- Prometheus metrics for config backup observability
|
||||
affects: [02-02-backup-scheduler, 03-backend-subscriber]
|
||||
|
||||
tech-stack:
|
||||
added: []
|
||||
patterns:
|
||||
- "TOFU host key verification via SHA256 fingerprint comparison"
|
||||
- "Config normalization pipeline: line endings, timestamp strip, whitespace trim, blank collapse"
|
||||
- "SSH error classification into typed SSHErrorKind enum"
|
||||
|
||||
key-files:
|
||||
created:
|
||||
- poller/internal/device/ssh_executor.go
|
||||
- poller/internal/device/ssh_executor_test.go
|
||||
- poller/internal/device/normalize.go
|
||||
- poller/internal/device/normalize_test.go
|
||||
- backend/alembic/versions/028_device_ssh_host_key.py
|
||||
modified:
|
||||
- poller/internal/config/config.go
|
||||
- poller/internal/bus/publisher.go
|
||||
- poller/internal/store/devices.go
|
||||
- poller/internal/observability/metrics.go
|
||||
|
||||
key-decisions:
|
||||
- "TOFU fingerprint format matches ssh-keygen: SHA256:base64(sha256(pubkey))"
|
||||
- "NormalizationVersion=1 constant included in NATS payloads for future re-processing"
|
||||
- "UpdateSSHHostKey sets first_seen via COALESCE to preserve original observation time"
|
||||
|
||||
patterns-established:
|
||||
- "SSH error classification: classifySSHError inspects error strings for auth/hostkey/timeout/refused patterns"
|
||||
- "Config normalization: version-tracked deterministic pipeline for RouterOS export output"
|
||||
|
||||
requirements-completed: [COLL-01, COLL-02, COLL-06]
|
||||
|
||||
duration: 5min
|
||||
completed: 2026-03-13
|
||||
---
|
||||
|
||||
# Phase 02 Plan 01: Config Backup Primitives Summary
|
||||
|
||||
**SSH executor with TOFU host key verification, RouterOS config normalizer with SHA256 hashing, NATS snapshot event, and Alembic migration for device SSH columns**
|
||||
|
||||
## Performance
|
||||
|
||||
- **Duration:** 5 min
|
||||
- **Started:** 2026-03-13T01:43:33Z
|
||||
- **Completed:** 2026-03-13T01:48:38Z
|
||||
- **Tasks:** 2
|
||||
- **Files modified:** 9
|
||||
|
||||
## Accomplishments
|
||||
- SSH RunCommand executor with context-aware dialing, TOFU host key callback, and 6-kind typed error classification
|
||||
- Deterministic config normalizer: strips RouterOS timestamps, normalizes line endings, trims whitespace, collapses blanks, computes SHA256 hash
|
||||
- 22 unit tests covering error classification, TOFU flows (first connect/match/mismatch), normalization edge cases, idempotency
|
||||
- Config backup env vars, NATS ConfigSnapshotEvent, device model SSH extensions, migration 028, Prometheus metrics
|
||||
|
||||
## Task Commits
|
||||
|
||||
Each task was committed atomically:
|
||||
|
||||
1. **Task 1: SSH executor, normalizer, and their tests** - `f1abb75` (feat)
|
||||
2. **Task 2: Config env vars, NATS event type, device model extensions, Alembic migration, metrics** - `4ae39d2` (feat)
|
||||
|
||||
_Note: Task 1 used TDD -- tests written first (RED), implementation second (GREEN)._
|
||||
|
||||
## Files Created/Modified
|
||||
- `poller/internal/device/ssh_executor.go` - RunCommand SSH executor with TOFU host key verification and typed errors
|
||||
- `poller/internal/device/ssh_executor_test.go` - Unit tests for SSH error classification, TOFU callbacks, CommandResult
|
||||
- `poller/internal/device/normalize.go` - NormalizeConfig and HashConfig for RouterOS export output
|
||||
- `poller/internal/device/normalize_test.go` - Table-driven tests for normalization pipeline edge cases
|
||||
- `poller/internal/config/config.go` - Added ConfigBackupIntervalSeconds, ConfigBackupMaxConcurrent, ConfigBackupCommandTimeoutSeconds
|
||||
- `poller/internal/bus/publisher.go` - Added ConfigSnapshotEvent type, PublishConfigSnapshot method, config.snapshot.> stream subject
|
||||
- `poller/internal/store/devices.go` - Added SSHPort/SSHHostKeyFingerprint fields, UpdateSSHHostKey method, updated queries
|
||||
- `poller/internal/observability/metrics.go` - Added ConfigBackupTotal, ConfigBackupDuration, ConfigBackupActive metrics
|
||||
- `backend/alembic/versions/028_device_ssh_host_key.py` - Migration adding ssh_port, ssh_host_key_fingerprint, timestamp columns
|
||||
|
||||
## Decisions Made
|
||||
- TOFU fingerprint format uses SHA256:base64(sha256(pubkey)) to match ssh-keygen output format
|
||||
- NormalizationVersion=1 constant is included in NATS payloads so consumers can detect algorithm changes
|
||||
- UpdateSSHHostKey uses COALESCE on ssh_host_key_first_seen to preserve original observation timestamp
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
### Auto-fixed Issues
|
||||
|
||||
**1. [Rule 1 - Bug] Fixed test key generation approach**
|
||||
- **Found during:** Task 1 (GREEN phase)
|
||||
- **Issue:** Embedded OpenSSH PEM test key had padding errors ("ssh: padding not as expected")
|
||||
- **Fix:** Switched to programmatic ed25519 key generation via crypto/ed25519.GenerateKey
|
||||
- **Files modified:** poller/internal/device/ssh_executor_test.go
|
||||
- **Verification:** All 22 tests pass
|
||||
- **Committed in:** f1abb75 (Task 1 commit)
|
||||
|
||||
---
|
||||
|
||||
**Total deviations:** 1 auto-fixed (1 bug)
|
||||
**Impact on plan:** Minimal -- test infrastructure fix only, no production code change.
|
||||
|
||||
## Issues Encountered
|
||||
None beyond the test key generation fix documented above.
|
||||
|
||||
## User Setup Required
|
||||
None - no external service configuration required.
|
||||
|
||||
## Next Phase Readiness
|
||||
- All primitives ready for Plan 02 (backup scheduler) to wire together
|
||||
- SSH executor, normalizer, NATS event, device model, config, and metrics are independently tested and compilable
|
||||
- Migration 028 ready to apply before deploying the backup scheduler
|
||||
|
||||
---
|
||||
*Phase: 02-poller-config-collection*
|
||||
*Completed: 2026-03-13*
|
||||
Reference in New Issue
Block a user