Files
the-other-dude/.planning/phases/02-poller-config-collection/02-02-SUMMARY.md
Jason Staack d456fe58e9 docs(02-02): complete backup scheduler plan
- SUMMARY.md with execution metrics and decisions
- STATE.md updated: Phase 2 complete, 3 plans done
- ROADMAP.md updated: Phase 2 marked complete
- REQUIREMENTS.md: COLL-03, COLL-05 marked complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:57:47 -05:00

101 lines
4.0 KiB
Markdown

---
phase: 02-poller-config-collection
plan: 02
subsystem: poller
tags: [ssh, backup, scheduler, nats, routeros, concurrency, tofu, redis]
requires:
- phase: 02-poller-config-collection/01
provides: SSH executor, config normalizer, NATS ConfigSnapshotEvent, Prometheus metrics, config fields
provides:
- BackupScheduler with per-device goroutines managing periodic SSH config collection
- Concurrency-limited config backup pipeline (SSH -> normalize -> hash -> NATS publish)
- TOFU host key verification with persistent fingerprint storage
- Auth/hostkey error blocking with transient error exponential backoff
- SSHHostKeyUpdater consumer-side interface
affects: [03-backend-snapshot-consumer, api, poller]
tech-stack:
added: []
patterns: [per-device goroutine lifecycle, buffered channel semaphore, Redis online gating]
key-files:
created:
- poller/internal/poller/backup_scheduler.go
- poller/internal/poller/backup_scheduler_test.go
modified:
- poller/internal/poller/interfaces.go
- poller/cmd/poller/main.go
key-decisions:
- "BackupScheduler runs independently from status poll scheduler with separate goroutines"
- "Semaphore uses buffered channel pattern matching existing codebase style"
- "Device with no Redis status key assumed potentially online (first poll not yet completed)"
patterns-established:
- "Backup goroutine pattern: jitter -> initial backup -> ticker loop with gating checks"
- "Error classification: auth/hostkey block retries, transient errors use exponential backoff"
requirements-completed: [COLL-01, COLL-03, COLL-05, COLL-06]
duration: 4min
completed: 2026-03-13
---
# Phase 2 Plan 2: Backup Scheduler Summary
**BackupScheduler orchestrating periodic SSH config collection with per-device goroutines, concurrency semaphore, TOFU verification, and NATS publishing**
## Performance
- **Duration:** 4 min
- **Started:** 2026-03-13T01:51:27Z
- **Completed:** 2026-03-13T01:55:37Z
- **Tasks:** 2
- **Files modified:** 4
## Accomplishments
- BackupScheduler manages per-device backup goroutines with 30-300s initial jitter
- Concurrency limited by configurable buffered channel semaphore (default 10)
- Auth failures and host key mismatches permanently block retries with clear log warnings
- Transient errors use stepped backoff (5m/15m/1h cap)
- Full pipeline wired into main.go running parallel to existing status poll scheduler
## Task Commits
Each task was committed atomically:
1. **Task 1: BackupScheduler with per-device goroutines** - `a884b09` (test) + `2653a32` (feat) -- TDD red/green
2. **Task 2: Wire BackupScheduler into main.go** - `d34817a` (feat)
## Files Created/Modified
- `poller/internal/poller/backup_scheduler.go` - BackupScheduler with per-device goroutines, concurrency control, SSH collection, NATS publishing
- `poller/internal/poller/backup_scheduler_test.go` - Unit tests for jitter, backoff, retry blocking, online gating, semaphore, reconciliation
- `poller/internal/poller/interfaces.go` - Added SSHHostKeyUpdater consumer-side interface
- `poller/cmd/poller/main.go` - BackupScheduler initialization and goroutine startup
## Decisions Made
- BackupScheduler runs independently from status poll scheduler -- separate goroutine pool, no shared state
- Semaphore uses buffered channel pattern (consistent with Go idioms, no external deps)
- Devices with no Redis status key assumed potentially online to avoid blocking first backup
- Locker nil-check allows tests to run without Redis lock infrastructure
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- Config backup pipeline complete: SSH -> normalize -> hash -> NATS publish
- Backend snapshot consumer (Phase 3) can subscribe to config.snapshot.create.> to receive snapshots
- Pre-existing integration test failures in poller package (missing certificate_authorities table) are unrelated to this work
---
*Phase: 02-poller-config-collection*
*Completed: 2026-03-13*