129 lines
5.8 KiB
Markdown
129 lines
5.8 KiB
Markdown
---
|
|
phase: 02-poller-config-collection
|
|
plan: 01
|
|
subsystem: poller
|
|
tags: [ssh, tofu, routeros, config-normalization, sha256, nats, prometheus, alembic]
|
|
|
|
requires:
|
|
- phase: 01-database-schema
|
|
provides: router_config_snapshots table for storing backup data
|
|
provides:
|
|
- SSH command executor with TOFU host key verification and typed error classification
|
|
- Config normalizer with deterministic SHA256 hashing
|
|
- ConfigSnapshotEvent NATS event type and PublishConfigSnapshot method
|
|
- Config backup environment variables (interval, concurrency, timeout)
|
|
- Device model SSH fields (port, host key fingerprint) with UpdateSSHHostKey method
|
|
- Alembic migration 028 for devices table SSH columns
|
|
- Prometheus metrics for config backup observability
|
|
affects: [02-02-backup-scheduler, 03-backend-subscriber]
|
|
|
|
tech-stack:
|
|
added: []
|
|
patterns:
|
|
- "TOFU host key verification via SHA256 fingerprint comparison"
|
|
- "Config normalization pipeline: line endings, timestamp strip, whitespace trim, blank collapse"
|
|
- "SSH error classification into typed SSHErrorKind enum"
|
|
|
|
key-files:
|
|
created:
|
|
- poller/internal/device/ssh_executor.go
|
|
- poller/internal/device/ssh_executor_test.go
|
|
- poller/internal/device/normalize.go
|
|
- poller/internal/device/normalize_test.go
|
|
- backend/alembic/versions/028_device_ssh_host_key.py
|
|
modified:
|
|
- poller/internal/config/config.go
|
|
- poller/internal/bus/publisher.go
|
|
- poller/internal/store/devices.go
|
|
- poller/internal/observability/metrics.go
|
|
|
|
key-decisions:
|
|
- "TOFU fingerprint format matches ssh-keygen: SHA256:base64(sha256(pubkey))"
|
|
- "NormalizationVersion=1 constant included in NATS payloads for future re-processing"
|
|
- "UpdateSSHHostKey sets first_seen via COALESCE to preserve original observation time"
|
|
|
|
patterns-established:
|
|
- "SSH error classification: classifySSHError inspects error strings for auth/hostkey/timeout/refused patterns"
|
|
- "Config normalization: version-tracked deterministic pipeline for RouterOS export output"
|
|
|
|
requirements-completed: [COLL-01, COLL-02, COLL-06]
|
|
|
|
duration: 5min
|
|
completed: 2026-03-13
|
|
---
|
|
|
|
# Phase 02 Plan 01: Config Backup Primitives Summary
|
|
|
|
**SSH executor with TOFU host key verification, RouterOS config normalizer with SHA256 hashing, NATS snapshot event, and Alembic migration for device SSH columns**
|
|
|
|
## Performance
|
|
|
|
- **Duration:** 5 min
|
|
- **Started:** 2026-03-13T01:43:33Z
|
|
- **Completed:** 2026-03-13T01:48:38Z
|
|
- **Tasks:** 2
|
|
- **Files modified:** 9
|
|
|
|
## Accomplishments
|
|
- SSH RunCommand executor with context-aware dialing, TOFU host key callback, and 6-kind typed error classification
|
|
- Deterministic config normalizer: strips RouterOS timestamps, normalizes line endings, trims whitespace, collapses blanks, computes SHA256 hash
|
|
- 22 unit tests covering error classification, TOFU flows (first connect/match/mismatch), normalization edge cases, idempotency
|
|
- Config backup env vars, NATS ConfigSnapshotEvent, device model SSH extensions, migration 028, Prometheus metrics
|
|
|
|
## Task Commits
|
|
|
|
Each task was committed atomically:
|
|
|
|
1. **Task 1: SSH executor, normalizer, and their tests** - `f1abb75` (feat)
|
|
2. **Task 2: Config env vars, NATS event type, device model extensions, Alembic migration, metrics** - `4ae39d2` (feat)
|
|
|
|
_Note: Task 1 used TDD -- tests written first (RED), implementation second (GREEN)._
|
|
|
|
## Files Created/Modified
|
|
- `poller/internal/device/ssh_executor.go` - RunCommand SSH executor with TOFU host key verification and typed errors
|
|
- `poller/internal/device/ssh_executor_test.go` - Unit tests for SSH error classification, TOFU callbacks, CommandResult
|
|
- `poller/internal/device/normalize.go` - NormalizeConfig and HashConfig for RouterOS export output
|
|
- `poller/internal/device/normalize_test.go` - Table-driven tests for normalization pipeline edge cases
|
|
- `poller/internal/config/config.go` - Added ConfigBackupIntervalSeconds, ConfigBackupMaxConcurrent, ConfigBackupCommandTimeoutSeconds
|
|
- `poller/internal/bus/publisher.go` - Added ConfigSnapshotEvent type, PublishConfigSnapshot method, config.snapshot.> stream subject
|
|
- `poller/internal/store/devices.go` - Added SSHPort/SSHHostKeyFingerprint fields, UpdateSSHHostKey method, updated queries
|
|
- `poller/internal/observability/metrics.go` - Added ConfigBackupTotal, ConfigBackupDuration, ConfigBackupActive metrics
|
|
- `backend/alembic/versions/028_device_ssh_host_key.py` - Migration adding ssh_port, ssh_host_key_fingerprint, timestamp columns
|
|
|
|
## Decisions Made
|
|
- TOFU fingerprint format uses SHA256:base64(sha256(pubkey)) to match ssh-keygen output format
|
|
- NormalizationVersion=1 constant is included in NATS payloads so consumers can detect algorithm changes
|
|
- UpdateSSHHostKey uses COALESCE on ssh_host_key_first_seen to preserve original observation timestamp
|
|
|
|
## Deviations from Plan
|
|
|
|
### Auto-fixed Issues
|
|
|
|
**1. [Rule 1 - Bug] Fixed test key generation approach**
|
|
- **Found during:** Task 1 (GREEN phase)
|
|
- **Issue:** Embedded OpenSSH PEM test key had padding errors ("ssh: padding not as expected")
|
|
- **Fix:** Switched to programmatic ed25519 key generation via crypto/ed25519.GenerateKey
|
|
- **Files modified:** poller/internal/device/ssh_executor_test.go
|
|
- **Verification:** All 22 tests pass
|
|
- **Committed in:** f1abb75 (Task 1 commit)
|
|
|
|
---
|
|
|
|
**Total deviations:** 1 auto-fixed (1 bug)
|
|
**Impact on plan:** Minimal -- test infrastructure fix only, no production code change.
|
|
|
|
## Issues Encountered
|
|
None beyond the test key generation fix documented above.
|
|
|
|
## User Setup Required
|
|
None - no external service configuration required.
|
|
|
|
## Next Phase Readiness
|
|
- All primitives ready for Plan 02 (backup scheduler) to wire together
|
|
- SSH executor, normalizer, NATS event, device model, config, and metrics are independently tested and compilable
|
|
- Migration 028 ready to apply before deploying the backup scheduler
|
|
|
|
---
|
|
*Phase: 02-poller-config-collection*
|
|
*Completed: 2026-03-13*
|