Files
the-other-dude/.planning/phases/02-poller-config-collection/02-01-SUMMARY.md
Jason Staack 7ff3178b84 docs(02-01): complete config backup primitives plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:50:27 -05:00

129 lines
5.8 KiB
Markdown

---
phase: 02-poller-config-collection
plan: 01
subsystem: poller
tags: [ssh, tofu, routeros, config-normalization, sha256, nats, prometheus, alembic]
requires:
- phase: 01-database-schema
provides: router_config_snapshots table for storing backup data
provides:
- SSH command executor with TOFU host key verification and typed error classification
- Config normalizer with deterministic SHA256 hashing
- ConfigSnapshotEvent NATS event type and PublishConfigSnapshot method
- Config backup environment variables (interval, concurrency, timeout)
- Device model SSH fields (port, host key fingerprint) with UpdateSSHHostKey method
- Alembic migration 028 for devices table SSH columns
- Prometheus metrics for config backup observability
affects: [02-02-backup-scheduler, 03-backend-subscriber]
tech-stack:
added: []
patterns:
- "TOFU host key verification via SHA256 fingerprint comparison"
- "Config normalization pipeline: line endings, timestamp strip, whitespace trim, blank collapse"
- "SSH error classification into typed SSHErrorKind enum"
key-files:
created:
- poller/internal/device/ssh_executor.go
- poller/internal/device/ssh_executor_test.go
- poller/internal/device/normalize.go
- poller/internal/device/normalize_test.go
- backend/alembic/versions/028_device_ssh_host_key.py
modified:
- poller/internal/config/config.go
- poller/internal/bus/publisher.go
- poller/internal/store/devices.go
- poller/internal/observability/metrics.go
key-decisions:
- "TOFU fingerprint format matches ssh-keygen: SHA256:base64(sha256(pubkey))"
- "NormalizationVersion=1 constant included in NATS payloads for future re-processing"
- "UpdateSSHHostKey sets first_seen via COALESCE to preserve original observation time"
patterns-established:
- "SSH error classification: classifySSHError inspects error strings for auth/hostkey/timeout/refused patterns"
- "Config normalization: version-tracked deterministic pipeline for RouterOS export output"
requirements-completed: [COLL-01, COLL-02, COLL-06]
duration: 5min
completed: 2026-03-13
---
# Phase 02 Plan 01: Config Backup Primitives Summary
**SSH executor with TOFU host key verification, RouterOS config normalizer with SHA256 hashing, NATS snapshot event, and Alembic migration for device SSH columns**
## Performance
- **Duration:** 5 min
- **Started:** 2026-03-13T01:43:33Z
- **Completed:** 2026-03-13T01:48:38Z
- **Tasks:** 2
- **Files modified:** 9
## Accomplishments
- SSH RunCommand executor with context-aware dialing, TOFU host key callback, and 6-kind typed error classification
- Deterministic config normalizer: strips RouterOS timestamps, normalizes line endings, trims whitespace, collapses blanks, computes SHA256 hash
- 22 unit tests covering error classification, TOFU flows (first connect/match/mismatch), normalization edge cases, idempotency
- Config backup env vars, NATS ConfigSnapshotEvent, device model SSH extensions, migration 028, Prometheus metrics
## Task Commits
Each task was committed atomically:
1. **Task 1: SSH executor, normalizer, and their tests** - `f1abb75` (feat)
2. **Task 2: Config env vars, NATS event type, device model extensions, Alembic migration, metrics** - `4ae39d2` (feat)
_Note: Task 1 used TDD -- tests written first (RED), implementation second (GREEN)._
## Files Created/Modified
- `poller/internal/device/ssh_executor.go` - RunCommand SSH executor with TOFU host key verification and typed errors
- `poller/internal/device/ssh_executor_test.go` - Unit tests for SSH error classification, TOFU callbacks, CommandResult
- `poller/internal/device/normalize.go` - NormalizeConfig and HashConfig for RouterOS export output
- `poller/internal/device/normalize_test.go` - Table-driven tests for normalization pipeline edge cases
- `poller/internal/config/config.go` - Added ConfigBackupIntervalSeconds, ConfigBackupMaxConcurrent, ConfigBackupCommandTimeoutSeconds
- `poller/internal/bus/publisher.go` - Added ConfigSnapshotEvent type, PublishConfigSnapshot method, config.snapshot.> stream subject
- `poller/internal/store/devices.go` - Added SSHPort/SSHHostKeyFingerprint fields, UpdateSSHHostKey method, updated queries
- `poller/internal/observability/metrics.go` - Added ConfigBackupTotal, ConfigBackupDuration, ConfigBackupActive metrics
- `backend/alembic/versions/028_device_ssh_host_key.py` - Migration adding ssh_port, ssh_host_key_fingerprint, timestamp columns
## Decisions Made
- TOFU fingerprint format uses SHA256:base64(sha256(pubkey)) to match ssh-keygen output format
- NormalizationVersion=1 constant is included in NATS payloads so consumers can detect algorithm changes
- UpdateSSHHostKey uses COALESCE on ssh_host_key_first_seen to preserve original observation timestamp
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 1 - Bug] Fixed test key generation approach**
- **Found during:** Task 1 (GREEN phase)
- **Issue:** Embedded OpenSSH PEM test key had padding errors ("ssh: padding not as expected")
- **Fix:** Switched to programmatic ed25519 key generation via crypto/ed25519.GenerateKey
- **Files modified:** poller/internal/device/ssh_executor_test.go
- **Verification:** All 22 tests pass
- **Committed in:** f1abb75 (Task 1 commit)
---
**Total deviations:** 1 auto-fixed (1 bug)
**Impact on plan:** Minimal -- test infrastructure fix only, no production code change.
## Issues Encountered
None beyond the test key generation fix documented above.
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- All primitives ready for Plan 02 (backup scheduler) to wire together
- SSH executor, normalizer, NATS event, device model, config, and metrics are independently tested and compilable
- Migration 028 ready to apply before deploying the backup scheduler
---
*Phase: 02-poller-config-collection*
*Completed: 2026-03-13*