docs(02-02): complete backup scheduler plan
- SUMMARY.md with execution metrics and decisions - STATE.md updated: Phase 2 complete, 3 plans done - ROADMAP.md updated: Phase 2 marked complete - REQUIREMENTS.md: COLL-03, COLL-05 marked complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -9,9 +9,9 @@
|
|||||||
|
|
||||||
- [x] **COLL-01**: Poller collects RouterOS config via SSH `/export show-sensitive` on a configurable interval (default 6h)
|
- [x] **COLL-01**: Poller collects RouterOS config via SSH `/export show-sensitive` on a configurable interval (default 6h)
|
||||||
- [x] **COLL-02**: Poller normalizes config output (trim whitespace, normalize line endings, remove timestamp headers)
|
- [x] **COLL-02**: Poller normalizes config output (trim whitespace, normalize line endings, remove timestamp headers)
|
||||||
- [ ] **COLL-03**: Poller sends config snapshot to API via NATS subject `config.snapshot.create`
|
- [x] **COLL-03**: Poller sends config snapshot to API via NATS subject `config.snapshot.create`
|
||||||
- [ ] **COLL-04**: Manual backup trigger via POST `/api/tenants/{tenant_id}/devices/{device_id}/backup`
|
- [ ] **COLL-04**: Manual backup trigger via POST `/api/tenants/{tenant_id}/devices/{device_id}/backup`
|
||||||
- [ ] **COLL-05**: Unreachable routers log warning and retry next interval
|
- [x] **COLL-05**: Unreachable routers log warning and retry next interval
|
||||||
- [x] **COLL-06**: Collection interval configurable via `CONFIG_BACKUP_INTERVAL` environment variable
|
- [x] **COLL-06**: Collection interval configurable via `CONFIG_BACKUP_INTERVAL` environment variable
|
||||||
|
|
||||||
### Storage
|
### Storage
|
||||||
@@ -70,9 +70,9 @@
|
|||||||
|-------------|-------|--------|
|
|-------------|-------|--------|
|
||||||
| COLL-01 | Phase 2: Poller Config Collection | Complete |
|
| COLL-01 | Phase 2: Poller Config Collection | Complete |
|
||||||
| COLL-02 | Phase 2: Poller Config Collection | Complete |
|
| COLL-02 | Phase 2: Poller Config Collection | Complete |
|
||||||
| COLL-03 | Phase 2: Poller Config Collection | Pending |
|
| COLL-03 | Phase 2: Poller Config Collection | Complete |
|
||||||
| COLL-04 | Phase 4: Manual Backup Trigger | Pending |
|
| COLL-04 | Phase 4: Manual Backup Trigger | Pending |
|
||||||
| COLL-05 | Phase 2: Poller Config Collection | Pending |
|
| COLL-05 | Phase 2: Poller Config Collection | Complete |
|
||||||
| COLL-06 | Phase 2: Poller Config Collection | Complete |
|
| COLL-06 | Phase 2: Poller Config Collection | Complete |
|
||||||
| STOR-01 | Phase 1: Database Schema | Complete |
|
| STOR-01 | Phase 1: Database Schema | Complete |
|
||||||
| STOR-02 | Phase 3: Snapshot Ingestion | Pending |
|
| STOR-02 | Phase 3: Snapshot Ingestion | Pending |
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ This roadmap delivers automated RouterOS configuration backup and change trackin
|
|||||||
Decimal phases appear between their surrounding integers in numeric order.
|
Decimal phases appear between their surrounding integers in numeric order.
|
||||||
|
|
||||||
- [x] **Phase 1: Database Schema** - Config snapshot, diff, and change tables with encryption and RLS (completed 2026-03-13)
|
- [x] **Phase 1: Database Schema** - Config snapshot, diff, and change tables with encryption and RLS (completed 2026-03-13)
|
||||||
- [ ] **Phase 2: Poller Config Collection** - SSH export, normalization, and NATS publishing from Go poller
|
- [x] **Phase 2: Poller Config Collection** - SSH export, normalization, and NATS publishing from Go poller (completed 2026-03-13)
|
||||||
- [ ] **Phase 3: Snapshot Ingestion** - Backend NATS subscriber stores snapshots with SHA256 deduplication
|
- [ ] **Phase 3: Snapshot Ingestion** - Backend NATS subscriber stores snapshots with SHA256 deduplication
|
||||||
- [ ] **Phase 4: Manual Backup Trigger** - API endpoint for on-demand config backup via poller
|
- [ ] **Phase 4: Manual Backup Trigger** - API endpoint for on-demand config backup via poller
|
||||||
- [ ] **Phase 5: Diff Engine** - Unified diff generation and structured change parsing
|
- [ ] **Phase 5: Diff Engine** - Unified diff generation and structured change parsing
|
||||||
@@ -175,7 +175,7 @@ Note: Phase 9 depends only on Phase 3 and Phase 10 depends on Phases 3/4/5, so P
|
|||||||
| Phase | Plans Complete | Status | Completed |
|
| Phase | Plans Complete | Status | Completed |
|
||||||
|-------|----------------|--------|-----------|
|
|-------|----------------|--------|-----------|
|
||||||
| 1. Database Schema | 1/1 | Complete | 2026-03-13 |
|
| 1. Database Schema | 1/1 | Complete | 2026-03-13 |
|
||||||
| 2. Poller Config Collection | 0/2 | Not started | - |
|
| 2. Poller Config Collection | 2/2 | Complete | 2026-03-13 |
|
||||||
| 3. Snapshot Ingestion | 0/1 | Not started | - |
|
| 3. Snapshot Ingestion | 0/1 | Not started | - |
|
||||||
| 4. Manual Backup Trigger | 0/1 | Not started | - |
|
| 4. Manual Backup Trigger | 0/1 | Not started | - |
|
||||||
| 5. Diff Engine | 0/2 | Not started | - |
|
| 5. Diff Engine | 0/2 | Not started | - |
|
||||||
|
|||||||
@@ -3,14 +3,14 @@ gsd_state_version: 1.0
|
|||||||
milestone: v9.6
|
milestone: v9.6
|
||||||
milestone_name: milestone
|
milestone_name: milestone
|
||||||
status: in_progress
|
status: in_progress
|
||||||
stopped_at: Completed 02-01-PLAN.md
|
stopped_at: Completed 02-02-PLAN.md
|
||||||
last_updated: "2026-03-13T01:49:00Z"
|
last_updated: "2026-03-13T01:55:37Z"
|
||||||
last_activity: 2026-03-13 -- Completed 02-01 config backup primitives (SSH executor, normalizer, NATS event, migration)
|
last_activity: 2026-03-13 -- Completed 02-02 backup scheduler (per-device goroutines, concurrency, main.go wiring)
|
||||||
progress:
|
progress:
|
||||||
total_phases: 10
|
total_phases: 10
|
||||||
completed_phases: 1
|
completed_phases: 2
|
||||||
total_plans: 3
|
total_plans: 3
|
||||||
completed_plans: 2
|
completed_plans: 3
|
||||||
percent: 100
|
percent: 100
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -25,26 +25,26 @@ See: .planning/PROJECT.md (updated 2026-03-12)
|
|||||||
|
|
||||||
## Current Position
|
## Current Position
|
||||||
|
|
||||||
Phase: 2 of 10 (Poller Config Collection)
|
Phase: 2 of 10 (Poller Config Collection) -- COMPLETE
|
||||||
Plan: 1 of 2 in current phase (02-01 complete)
|
Plan: 2 of 2 in current phase (02-02 complete)
|
||||||
Status: Phase 2 in progress
|
Status: Phase 2 complete
|
||||||
Last activity: 2026-03-13 -- Completed 02-01 config backup primitives (SSH executor, normalizer, NATS event, migration)
|
Last activity: 2026-03-13 -- Completed 02-02 backup scheduler with per-device goroutines and main.go wiring
|
||||||
|
|
||||||
Progress: [███████░░░] 67%
|
Progress: [██████████] 100%
|
||||||
|
|
||||||
## Performance Metrics
|
## Performance Metrics
|
||||||
|
|
||||||
**Velocity:**
|
**Velocity:**
|
||||||
- Total plans completed: 2
|
- Total plans completed: 3
|
||||||
- Average duration: 4min
|
- Average duration: 4min
|
||||||
- Total execution time: 0.13 hours
|
- Total execution time: 0.20 hours
|
||||||
|
|
||||||
**By Phase:**
|
**By Phase:**
|
||||||
|
|
||||||
| Phase | Plans | Total | Avg/Plan |
|
| Phase | Plans | Total | Avg/Plan |
|
||||||
|-------|-------|-------|----------|
|
|-------|-------|-------|----------|
|
||||||
| 01-database-schema | 1 | 3min | 3min |
|
| 01-database-schema | 1 | 3min | 3min |
|
||||||
| 02-poller-config-collection | 1 | 5min | 5min |
|
| 02-poller-config-collection | 2 | 9min | 4.5min |
|
||||||
|
|
||||||
**Recent Trend:**
|
**Recent Trend:**
|
||||||
- Last 5 plans: none
|
- Last 5 plans: none
|
||||||
@@ -65,6 +65,9 @@ Recent decisions affecting current work:
|
|||||||
- [02-01] TOFU fingerprint format matches ssh-keygen: SHA256:base64(sha256(pubkey))
|
- [02-01] TOFU fingerprint format matches ssh-keygen: SHA256:base64(sha256(pubkey))
|
||||||
- [02-01] NormalizationVersion=1 constant in NATS payloads for future re-processing
|
- [02-01] NormalizationVersion=1 constant in NATS payloads for future re-processing
|
||||||
- [02-01] UpdateSSHHostKey uses COALESCE on first_seen to preserve original observation time
|
- [02-01] UpdateSSHHostKey uses COALESCE on first_seen to preserve original observation time
|
||||||
|
- [02-02] BackupScheduler runs independently from status poll scheduler with separate goroutines
|
||||||
|
- [02-02] Buffered channel semaphore for concurrency control (Go idiom, no external deps)
|
||||||
|
- [02-02] Devices with no Redis status key assumed potentially online for first backup
|
||||||
|
|
||||||
### Pending Todos
|
### Pending Todos
|
||||||
|
|
||||||
@@ -76,6 +79,6 @@ None yet.
|
|||||||
|
|
||||||
## Session Continuity
|
## Session Continuity
|
||||||
|
|
||||||
Last session: 2026-03-13T01:49:00Z
|
Last session: 2026-03-13T01:55:37Z
|
||||||
Stopped at: Completed 02-01-PLAN.md
|
Stopped at: Completed 02-02-PLAN.md (Phase 2 complete)
|
||||||
Resume file: .planning/phases/02-poller-config-collection/02-02-PLAN.md
|
Resume file: Next phase (03)
|
||||||
|
|||||||
100
.planning/phases/02-poller-config-collection/02-02-SUMMARY.md
Normal file
100
.planning/phases/02-poller-config-collection/02-02-SUMMARY.md
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
---
|
||||||
|
phase: 02-poller-config-collection
|
||||||
|
plan: 02
|
||||||
|
subsystem: poller
|
||||||
|
tags: [ssh, backup, scheduler, nats, routeros, concurrency, tofu, redis]
|
||||||
|
|
||||||
|
requires:
|
||||||
|
- phase: 02-poller-config-collection/01
|
||||||
|
provides: SSH executor, config normalizer, NATS ConfigSnapshotEvent, Prometheus metrics, config fields
|
||||||
|
provides:
|
||||||
|
- BackupScheduler with per-device goroutines managing periodic SSH config collection
|
||||||
|
- Concurrency-limited config backup pipeline (SSH -> normalize -> hash -> NATS publish)
|
||||||
|
- TOFU host key verification with persistent fingerprint storage
|
||||||
|
- Auth/hostkey error blocking with transient error exponential backoff
|
||||||
|
- SSHHostKeyUpdater consumer-side interface
|
||||||
|
affects: [03-backend-snapshot-consumer, api, poller]
|
||||||
|
|
||||||
|
tech-stack:
|
||||||
|
added: []
|
||||||
|
patterns: [per-device goroutine lifecycle, buffered channel semaphore, Redis online gating]
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created:
|
||||||
|
- poller/internal/poller/backup_scheduler.go
|
||||||
|
- poller/internal/poller/backup_scheduler_test.go
|
||||||
|
modified:
|
||||||
|
- poller/internal/poller/interfaces.go
|
||||||
|
- poller/cmd/poller/main.go
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- "BackupScheduler runs independently from status poll scheduler with separate goroutines"
|
||||||
|
- "Semaphore uses buffered channel pattern matching existing codebase style"
|
||||||
|
- "Device with no Redis status key assumed potentially online (first poll not yet completed)"
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- "Backup goroutine pattern: jitter -> initial backup -> ticker loop with gating checks"
|
||||||
|
- "Error classification: auth/hostkey block retries, transient errors use exponential backoff"
|
||||||
|
|
||||||
|
requirements-completed: [COLL-01, COLL-03, COLL-05, COLL-06]
|
||||||
|
|
||||||
|
duration: 4min
|
||||||
|
completed: 2026-03-13
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 2 Plan 2: Backup Scheduler Summary
|
||||||
|
|
||||||
|
**BackupScheduler orchestrating periodic SSH config collection with per-device goroutines, concurrency semaphore, TOFU verification, and NATS publishing**
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- **Duration:** 4 min
|
||||||
|
- **Started:** 2026-03-13T01:51:27Z
|
||||||
|
- **Completed:** 2026-03-13T01:55:37Z
|
||||||
|
- **Tasks:** 2
|
||||||
|
- **Files modified:** 4
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
- BackupScheduler manages per-device backup goroutines with 30-300s initial jitter
|
||||||
|
- Concurrency limited by configurable buffered channel semaphore (default 10)
|
||||||
|
- Auth failures and host key mismatches permanently block retries with clear log warnings
|
||||||
|
- Transient errors use stepped backoff (5m/15m/1h cap)
|
||||||
|
- Full pipeline wired into main.go running parallel to existing status poll scheduler
|
||||||
|
|
||||||
|
## Task Commits
|
||||||
|
|
||||||
|
Each task was committed atomically:
|
||||||
|
|
||||||
|
1. **Task 1: BackupScheduler with per-device goroutines** - `a884b09` (test) + `2653a32` (feat) -- TDD red/green
|
||||||
|
2. **Task 2: Wire BackupScheduler into main.go** - `d34817a` (feat)
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
- `poller/internal/poller/backup_scheduler.go` - BackupScheduler with per-device goroutines, concurrency control, SSH collection, NATS publishing
|
||||||
|
- `poller/internal/poller/backup_scheduler_test.go` - Unit tests for jitter, backoff, retry blocking, online gating, semaphore, reconciliation
|
||||||
|
- `poller/internal/poller/interfaces.go` - Added SSHHostKeyUpdater consumer-side interface
|
||||||
|
- `poller/cmd/poller/main.go` - BackupScheduler initialization and goroutine startup
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
- BackupScheduler runs independently from status poll scheduler -- separate goroutine pool, no shared state
|
||||||
|
- Semaphore uses buffered channel pattern (consistent with Go idioms, no external deps)
|
||||||
|
- Devices with no Redis status key assumed potentially online to avoid blocking first backup
|
||||||
|
- Locker nil-check allows tests to run without Redis lock infrastructure
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
None - plan executed exactly as written.
|
||||||
|
|
||||||
|
## Issues Encountered
|
||||||
|
None
|
||||||
|
|
||||||
|
## User Setup Required
|
||||||
|
None - no external service configuration required.
|
||||||
|
|
||||||
|
## Next Phase Readiness
|
||||||
|
- Config backup pipeline complete: SSH -> normalize -> hash -> NATS publish
|
||||||
|
- Backend snapshot consumer (Phase 3) can subscribe to config.snapshot.create.> to receive snapshots
|
||||||
|
- Pre-existing integration test failures in poller package (missing certificate_authorities table) are unrelated to this work
|
||||||
|
|
||||||
|
---
|
||||||
|
*Phase: 02-poller-config-collection*
|
||||||
|
*Completed: 2026-03-13*
|
||||||
Reference in New Issue
Block a user