diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md index 2a5ae4e..fb3f416 100644 --- a/.planning/REQUIREMENTS.md +++ b/.planning/REQUIREMENTS.md @@ -17,7 +17,7 @@ ### Storage - [x] **STOR-01**: API stores config snapshots in `router_config_snapshots` table with SHA256 hash -- [ ] **STOR-02**: Duplicate snapshots (same hash as previous) are skipped, no diff generated +- [x] **STOR-02**: Duplicate snapshots (same hash as previous) are skipped, no diff generated - [ ] **STOR-03**: Snapshots retained for 90 days (configurable via `CONFIG_RETENTION_DAYS`) - [ ] **STOR-04**: Older snapshots automatically deleted by retention cleanup - [x] **STOR-05**: Snapshots encrypted at rest, accessible only through RBAC @@ -75,7 +75,7 @@ | COLL-05 | Phase 2: Poller Config Collection | Complete | | COLL-06 | Phase 2: Poller Config Collection | Complete | | STOR-01 | Phase 1: Database Schema | Complete | -| STOR-02 | Phase 3: Snapshot Ingestion | Pending | +| STOR-02 | Phase 3: Snapshot Ingestion | Complete | | STOR-03 | Phase 9: Retention & Cleanup | Pending | | STOR-04 | Phase 9: Retention & Cleanup | Pending | | STOR-05 | Phase 1: Database Schema | Complete | diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index d2fdc55..a9c6066 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -56,17 +56,17 @@ Plans: - [ ] 02-02-PLAN.md — Backup scheduler with per-device goroutines, concurrency control, retry logic, and main.go wiring ### Phase 3: Snapshot Ingestion -**Goal**: Backend receives config snapshots from NATS, computes SHA256 hash, and stores new snapshots while skipping duplicates +**Goal**: Backend receives config snapshots from NATS, encrypts via Transit, deduplicates by SHA256, and stores new snapshots **Depends on**: Phase 1, Phase 2 **Requirements**: STOR-02 **Success Criteria** (what must be TRUE): 1. Backend NATS subscriber consumes `config.snapshot.create` messages and persists snapshots to `router_config_snapshots` 2. When a snapshot has the same SHA256 hash as the device's most recent snapshot, it is skipped (no new row, no diff) 3. Each stored snapshot includes device_id, tenant_id, config_text (encrypted), sha256_hash, and collected_at timestamp -**Plans**: TBD +**Plans**: 1 plan Plans: -- [ ] 03-01: NATS subscriber for config snapshot ingestion with deduplication +- [ ] 03-01-PLAN.md — NATS subscriber for config snapshot ingestion with dedup, encryption, and main.py wiring ### Phase 4: Manual Backup Trigger **Goal**: Operators can trigger an immediate config backup for a specific device through the API diff --git a/.planning/STATE.md b/.planning/STATE.md index dc367f1..4da4c6b 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -2,15 +2,15 @@ gsd_state_version: 1.0 milestone: v9.6 milestone_name: milestone -status: in_progress -stopped_at: Completed 02-02-PLAN.md -last_updated: "2026-03-13T01:55:37Z" -last_activity: 2026-03-13 -- Completed 02-02 backup scheduler (per-device goroutines, concurrency, main.go wiring) +status: completed +stopped_at: Completed 03-01-PLAN.md +last_updated: "2026-03-13T02:48:59.037Z" +last_activity: 2026-03-13 -- Completed 02-02 backup scheduler with per-device goroutines and main.go wiring progress: total_phases: 10 - completed_phases: 2 - total_plans: 3 - completed_plans: 3 + completed_phases: 3 + total_plans: 4 + completed_plans: 4 percent: 100 --- @@ -21,23 +21,23 @@ progress: See: .planning/PROJECT.md (updated 2026-03-12) **Core value:** Operators can see exactly what changed on a router and when, with reliable config snapshots for download -**Current focus:** Phase 2: Poller Config Collection +**Current focus:** Phase 3: Snapshot Ingestion -- COMPLETE ## Current Position -Phase: 2 of 10 (Poller Config Collection) -- COMPLETE -Plan: 2 of 2 in current phase (02-02 complete) -Status: Phase 2 complete -Last activity: 2026-03-13 -- Completed 02-02 backup scheduler with per-device goroutines and main.go wiring +Phase: 3 of 10 (Snapshot Ingestion) -- COMPLETE +Plan: 1 of 1 in current phase (03-01 complete) +Status: Phase 3 complete +Last activity: 2026-03-13 -- Completed 03-01 config snapshot subscriber with dedup, Transit encryption, and NATS ingestion Progress: [██████████] 100% ## Performance Metrics **Velocity:** -- Total plans completed: 3 +- Total plans completed: 4 - Average duration: 4min -- Total execution time: 0.20 hours +- Total execution time: 0.27 hours **By Phase:** @@ -45,10 +45,11 @@ Progress: [██████████] 100% |-------|-------|-------|----------| | 01-database-schema | 1 | 3min | 3min | | 02-poller-config-collection | 2 | 9min | 4.5min | +| 03-snapshot-ingestion | 1 | 4min | 4min | **Recent Trend:** -- Last 5 plans: none -- Trend: N/A +- Last 5 plans: 3min, 4min, 5min, 4min +- Trend: stable *Updated after each plan completion* @@ -68,6 +69,8 @@ Recent decisions affecting current work: - [02-02] BackupScheduler runs independently from status poll scheduler with separate goroutines - [02-02] Buffered channel semaphore for concurrency control (Go idiom, no external deps) - [02-02] Devices with no Redis status key assumed potentially online for first backup +- [Phase 03]: Trust poller-provided SHA256 hash (no recompute on backend) +- [Phase 03]: Transit failure causes nak (NATS retry), plaintext never stored as fallback ### Pending Todos @@ -79,6 +82,6 @@ None yet. ## Session Continuity -Last session: 2026-03-13T01:55:37Z -Stopped at: Completed 02-02-PLAN.md (Phase 2 complete) -Resume file: Next phase (03) +Last session: 2026-03-13T02:48:59.034Z +Stopped at: Completed 03-01-PLAN.md +Resume file: None diff --git a/.planning/phases/03-snapshot-ingestion/03-01-SUMMARY.md b/.planning/phases/03-snapshot-ingestion/03-01-SUMMARY.md new file mode 100644 index 0000000..134e78f --- /dev/null +++ b/.planning/phases/03-snapshot-ingestion/03-01-SUMMARY.md @@ -0,0 +1,108 @@ +--- +phase: 03-snapshot-ingestion +plan: 01 +subsystem: api +tags: [nats, jetstream, openbao, transit, encryption, postgresql, prometheus, dedup] + +# Dependency graph +requires: + - phase: 01-database-schema + provides: RouterConfigSnapshot model and router_config_snapshots table + - phase: 02-poller-config-collection + provides: Go poller publishes config.snapshot.> NATS messages +provides: + - NATS subscriber consuming config.snapshot.> messages + - SHA256 dedup preventing duplicate snapshot storage + - OpenBao Transit encryption of config text before INSERT + - Prometheus metrics for ingestion monitoring +affects: [04-diff-engine, snapshot-api, config-timeline] + +# Tech tracking +tech-stack: + added: [prometheus_client] + patterns: [nats-subscriber-with-dedup, transit-encrypt-before-insert] + +key-files: + created: + - backend/app/services/config_snapshot_subscriber.py + - backend/tests/test_config_snapshot_subscriber.py + modified: + - backend/app/main.py + +key-decisions: + - "Trust poller-provided SHA256 hash (no recompute on backend)" + - "Raw SQL for dedup SELECT and INSERT (consistent with nats_subscriber.py pattern)" + - "OpenBao Transit service instantiated per-message with close() for connection hygiene" + +patterns-established: + - "Config snapshot ingestion: dedup by SHA256 -> encrypt -> INSERT -> ack" + - "Transit failure causes nak (NATS retry), plaintext never stored as fallback" + +requirements-completed: [STOR-02] + +# Metrics +duration: 4min +completed: 2026-03-13 +--- + +# Phase 3 Plan 1: Config Snapshot Subscriber Summary + +**NATS subscriber ingesting config snapshots with SHA256 dedup, OpenBao Transit encryption, and Prometheus metrics** + +## Performance + +- **Duration:** 4 min +- **Started:** 2026-03-13T02:44:01Z +- **Completed:** 2026-03-13T02:48:08Z +- **Tasks:** 2 +- **Files modified:** 3 + +## Accomplishments +- NATS subscriber consuming config.snapshot.> on DEVICE_EVENTS stream with durable consumer +- SHA256 dedup: duplicate snapshots silently skipped at debug level with Prometheus counter +- OpenBao Transit encryption: plaintext never stored in PostgreSQL, Transit failure causes nak +- Malformed and orphan device messages acked and discarded safely with warning logs +- 6 unit tests covering all handler paths (new, duplicate, encrypt fail, malformed, orphan, first) +- Wired into main.py lifespan with non-fatal startup pattern + +## Task Commits + +Each task was committed atomically: + +1. **Task 1 (RED): Failing tests** - `9d82741` (test) +2. **Task 1 (GREEN): Config snapshot subscriber** - `3ab9f27` (feat) +3. **Task 2: Wire into main.py lifespan** - `0db0641` (feat) + +_TDD task had RED + GREEN commits_ + +## Files Created/Modified +- `backend/app/services/config_snapshot_subscriber.py` - NATS subscriber with dedup, encryption, metrics +- `backend/tests/test_config_snapshot_subscriber.py` - 6 unit tests for all handler paths +- `backend/app/main.py` - Lifespan wiring for start/stop + +## Decisions Made +- Trust poller-provided SHA256 hash (no recompute on backend) -- per project decision +- Raw SQL for dedup SELECT and INSERT -- consistent with existing nats_subscriber.py pattern +- OpenBao Transit service instantiated per-message with close() -- connection hygiene +- config_text never appears in any log statement -- contains passwords and keys + +## Deviations from Plan + +None - plan executed exactly as written. + +## Issues Encountered + +None. + +## User Setup Required + +None - no external service configuration required. + +## Next Phase Readiness +- Config snapshot subscriber ready to receive messages from Go poller +- RouterConfigSnapshot rows will be available for diff engine (Phase 4) +- Prometheus metrics exposed for monitoring ingestion rate and errors + +--- +*Phase: 03-snapshot-ingestion* +*Completed: 2026-03-13*