docs(02-02): complete backup scheduler plan
- SUMMARY.md with execution metrics and decisions - STATE.md updated: Phase 2 complete, 3 plans done - ROADMAP.md updated: Phase 2 marked complete - REQUIREMENTS.md: COLL-03, COLL-05 marked complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
100
.planning/phases/02-poller-config-collection/02-02-SUMMARY.md
Normal file
100
.planning/phases/02-poller-config-collection/02-02-SUMMARY.md
Normal file
@@ -0,0 +1,100 @@
|
||||
---
|
||||
phase: 02-poller-config-collection
|
||||
plan: 02
|
||||
subsystem: poller
|
||||
tags: [ssh, backup, scheduler, nats, routeros, concurrency, tofu, redis]
|
||||
|
||||
requires:
|
||||
- phase: 02-poller-config-collection/01
|
||||
provides: SSH executor, config normalizer, NATS ConfigSnapshotEvent, Prometheus metrics, config fields
|
||||
provides:
|
||||
- BackupScheduler with per-device goroutines managing periodic SSH config collection
|
||||
- Concurrency-limited config backup pipeline (SSH -> normalize -> hash -> NATS publish)
|
||||
- TOFU host key verification with persistent fingerprint storage
|
||||
- Auth/hostkey error blocking with transient error exponential backoff
|
||||
- SSHHostKeyUpdater consumer-side interface
|
||||
affects: [03-backend-snapshot-consumer, api, poller]
|
||||
|
||||
tech-stack:
|
||||
added: []
|
||||
patterns: [per-device goroutine lifecycle, buffered channel semaphore, Redis online gating]
|
||||
|
||||
key-files:
|
||||
created:
|
||||
- poller/internal/poller/backup_scheduler.go
|
||||
- poller/internal/poller/backup_scheduler_test.go
|
||||
modified:
|
||||
- poller/internal/poller/interfaces.go
|
||||
- poller/cmd/poller/main.go
|
||||
|
||||
key-decisions:
|
||||
- "BackupScheduler runs independently from status poll scheduler with separate goroutines"
|
||||
- "Semaphore uses buffered channel pattern matching existing codebase style"
|
||||
- "Device with no Redis status key assumed potentially online (first poll not yet completed)"
|
||||
|
||||
patterns-established:
|
||||
- "Backup goroutine pattern: jitter -> initial backup -> ticker loop with gating checks"
|
||||
- "Error classification: auth/hostkey block retries, transient errors use exponential backoff"
|
||||
|
||||
requirements-completed: [COLL-01, COLL-03, COLL-05, COLL-06]
|
||||
|
||||
duration: 4min
|
||||
completed: 2026-03-13
|
||||
---
|
||||
|
||||
# Phase 2 Plan 2: Backup Scheduler Summary
|
||||
|
||||
**BackupScheduler orchestrating periodic SSH config collection with per-device goroutines, concurrency semaphore, TOFU verification, and NATS publishing**
|
||||
|
||||
## Performance
|
||||
|
||||
- **Duration:** 4 min
|
||||
- **Started:** 2026-03-13T01:51:27Z
|
||||
- **Completed:** 2026-03-13T01:55:37Z
|
||||
- **Tasks:** 2
|
||||
- **Files modified:** 4
|
||||
|
||||
## Accomplishments
|
||||
- BackupScheduler manages per-device backup goroutines with 30-300s initial jitter
|
||||
- Concurrency limited by configurable buffered channel semaphore (default 10)
|
||||
- Auth failures and host key mismatches permanently block retries with clear log warnings
|
||||
- Transient errors use stepped backoff (5m/15m/1h cap)
|
||||
- Full pipeline wired into main.go running parallel to existing status poll scheduler
|
||||
|
||||
## Task Commits
|
||||
|
||||
Each task was committed atomically:
|
||||
|
||||
1. **Task 1: BackupScheduler with per-device goroutines** - `a884b09` (test) + `2653a32` (feat) -- TDD red/green
|
||||
2. **Task 2: Wire BackupScheduler into main.go** - `d34817a` (feat)
|
||||
|
||||
## Files Created/Modified
|
||||
- `poller/internal/poller/backup_scheduler.go` - BackupScheduler with per-device goroutines, concurrency control, SSH collection, NATS publishing
|
||||
- `poller/internal/poller/backup_scheduler_test.go` - Unit tests for jitter, backoff, retry blocking, online gating, semaphore, reconciliation
|
||||
- `poller/internal/poller/interfaces.go` - Added SSHHostKeyUpdater consumer-side interface
|
||||
- `poller/cmd/poller/main.go` - BackupScheduler initialization and goroutine startup
|
||||
|
||||
## Decisions Made
|
||||
- BackupScheduler runs independently from status poll scheduler -- separate goroutine pool, no shared state
|
||||
- Semaphore uses buffered channel pattern (consistent with Go idioms, no external deps)
|
||||
- Devices with no Redis status key assumed potentially online to avoid blocking first backup
|
||||
- Locker nil-check allows tests to run without Redis lock infrastructure
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
None - plan executed exactly as written.
|
||||
|
||||
## Issues Encountered
|
||||
None
|
||||
|
||||
## User Setup Required
|
||||
None - no external service configuration required.
|
||||
|
||||
## Next Phase Readiness
|
||||
- Config backup pipeline complete: SSH -> normalize -> hash -> NATS publish
|
||||
- Backend snapshot consumer (Phase 3) can subscribe to config.snapshot.create.> to receive snapshots
|
||||
- Pre-existing integration test failures in poller package (missing certificate_authorities table) are unrelated to this work
|
||||
|
||||
---
|
||||
*Phase: 02-poller-config-collection*
|
||||
*Completed: 2026-03-13*
|
||||
Reference in New Issue
Block a user