docs(03-01): complete config snapshot subscriber plan

- SUMMARY.md with task commits and decisions
- STATE.md updated to Phase 3 complete
- ROADMAP.md progress updated
- REQUIREMENTS.md: STOR-02 marked complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Jason Staack
2026-03-12 21:49:43 -05:00
parent 0db06419e7
commit bf3fb509ed
4 changed files with 135 additions and 24 deletions

View File

@@ -17,7 +17,7 @@
### Storage ### Storage
- [x] **STOR-01**: API stores config snapshots in `router_config_snapshots` table with SHA256 hash - [x] **STOR-01**: API stores config snapshots in `router_config_snapshots` table with SHA256 hash
- [ ] **STOR-02**: Duplicate snapshots (same hash as previous) are skipped, no diff generated - [x] **STOR-02**: Duplicate snapshots (same hash as previous) are skipped, no diff generated
- [ ] **STOR-03**: Snapshots retained for 90 days (configurable via `CONFIG_RETENTION_DAYS`) - [ ] **STOR-03**: Snapshots retained for 90 days (configurable via `CONFIG_RETENTION_DAYS`)
- [ ] **STOR-04**: Older snapshots automatically deleted by retention cleanup - [ ] **STOR-04**: Older snapshots automatically deleted by retention cleanup
- [x] **STOR-05**: Snapshots encrypted at rest, accessible only through RBAC - [x] **STOR-05**: Snapshots encrypted at rest, accessible only through RBAC
@@ -75,7 +75,7 @@
| COLL-05 | Phase 2: Poller Config Collection | Complete | | COLL-05 | Phase 2: Poller Config Collection | Complete |
| COLL-06 | Phase 2: Poller Config Collection | Complete | | COLL-06 | Phase 2: Poller Config Collection | Complete |
| STOR-01 | Phase 1: Database Schema | Complete | | STOR-01 | Phase 1: Database Schema | Complete |
| STOR-02 | Phase 3: Snapshot Ingestion | Pending | | STOR-02 | Phase 3: Snapshot Ingestion | Complete |
| STOR-03 | Phase 9: Retention & Cleanup | Pending | | STOR-03 | Phase 9: Retention & Cleanup | Pending |
| STOR-04 | Phase 9: Retention & Cleanup | Pending | | STOR-04 | Phase 9: Retention & Cleanup | Pending |
| STOR-05 | Phase 1: Database Schema | Complete | | STOR-05 | Phase 1: Database Schema | Complete |

View File

@@ -56,17 +56,17 @@ Plans:
- [ ] 02-02-PLAN.md — Backup scheduler with per-device goroutines, concurrency control, retry logic, and main.go wiring - [ ] 02-02-PLAN.md — Backup scheduler with per-device goroutines, concurrency control, retry logic, and main.go wiring
### Phase 3: Snapshot Ingestion ### Phase 3: Snapshot Ingestion
**Goal**: Backend receives config snapshots from NATS, computes SHA256 hash, and stores new snapshots while skipping duplicates **Goal**: Backend receives config snapshots from NATS, encrypts via Transit, deduplicates by SHA256, and stores new snapshots
**Depends on**: Phase 1, Phase 2 **Depends on**: Phase 1, Phase 2
**Requirements**: STOR-02 **Requirements**: STOR-02
**Success Criteria** (what must be TRUE): **Success Criteria** (what must be TRUE):
1. Backend NATS subscriber consumes `config.snapshot.create` messages and persists snapshots to `router_config_snapshots` 1. Backend NATS subscriber consumes `config.snapshot.create` messages and persists snapshots to `router_config_snapshots`
2. When a snapshot has the same SHA256 hash as the device's most recent snapshot, it is skipped (no new row, no diff) 2. When a snapshot has the same SHA256 hash as the device's most recent snapshot, it is skipped (no new row, no diff)
3. Each stored snapshot includes device_id, tenant_id, config_text (encrypted), sha256_hash, and collected_at timestamp 3. Each stored snapshot includes device_id, tenant_id, config_text (encrypted), sha256_hash, and collected_at timestamp
**Plans**: TBD **Plans**: 1 plan
Plans: Plans:
- [ ] 03-01: NATS subscriber for config snapshot ingestion with deduplication - [ ] 03-01-PLAN.md — NATS subscriber for config snapshot ingestion with dedup, encryption, and main.py wiring
### Phase 4: Manual Backup Trigger ### Phase 4: Manual Backup Trigger
**Goal**: Operators can trigger an immediate config backup for a specific device through the API **Goal**: Operators can trigger an immediate config backup for a specific device through the API

View File

@@ -2,15 +2,15 @@
gsd_state_version: 1.0 gsd_state_version: 1.0
milestone: v9.6 milestone: v9.6
milestone_name: milestone milestone_name: milestone
status: in_progress status: completed
stopped_at: Completed 02-02-PLAN.md stopped_at: Completed 03-01-PLAN.md
last_updated: "2026-03-13T01:55:37Z" last_updated: "2026-03-13T02:48:59.037Z"
last_activity: 2026-03-13 -- Completed 02-02 backup scheduler (per-device goroutines, concurrency, main.go wiring) last_activity: 2026-03-13 -- Completed 02-02 backup scheduler with per-device goroutines and main.go wiring
progress: progress:
total_phases: 10 total_phases: 10
completed_phases: 2 completed_phases: 3
total_plans: 3 total_plans: 4
completed_plans: 3 completed_plans: 4
percent: 100 percent: 100
--- ---
@@ -21,23 +21,23 @@ progress:
See: .planning/PROJECT.md (updated 2026-03-12) See: .planning/PROJECT.md (updated 2026-03-12)
**Core value:** Operators can see exactly what changed on a router and when, with reliable config snapshots for download **Core value:** Operators can see exactly what changed on a router and when, with reliable config snapshots for download
**Current focus:** Phase 2: Poller Config Collection **Current focus:** Phase 3: Snapshot Ingestion -- COMPLETE
## Current Position ## Current Position
Phase: 2 of 10 (Poller Config Collection) -- COMPLETE Phase: 3 of 10 (Snapshot Ingestion) -- COMPLETE
Plan: 2 of 2 in current phase (02-02 complete) Plan: 1 of 1 in current phase (03-01 complete)
Status: Phase 2 complete Status: Phase 3 complete
Last activity: 2026-03-13 -- Completed 02-02 backup scheduler with per-device goroutines and main.go wiring Last activity: 2026-03-13 -- Completed 03-01 config snapshot subscriber with dedup, Transit encryption, and NATS ingestion
Progress: [██████████] 100% Progress: [██████████] 100%
## Performance Metrics ## Performance Metrics
**Velocity:** **Velocity:**
- Total plans completed: 3 - Total plans completed: 4
- Average duration: 4min - Average duration: 4min
- Total execution time: 0.20 hours - Total execution time: 0.27 hours
**By Phase:** **By Phase:**
@@ -45,10 +45,11 @@ Progress: [██████████] 100%
|-------|-------|-------|----------| |-------|-------|-------|----------|
| 01-database-schema | 1 | 3min | 3min | | 01-database-schema | 1 | 3min | 3min |
| 02-poller-config-collection | 2 | 9min | 4.5min | | 02-poller-config-collection | 2 | 9min | 4.5min |
| 03-snapshot-ingestion | 1 | 4min | 4min |
**Recent Trend:** **Recent Trend:**
- Last 5 plans: none - Last 5 plans: 3min, 4min, 5min, 4min
- Trend: N/A - Trend: stable
*Updated after each plan completion* *Updated after each plan completion*
@@ -68,6 +69,8 @@ Recent decisions affecting current work:
- [02-02] BackupScheduler runs independently from status poll scheduler with separate goroutines - [02-02] BackupScheduler runs independently from status poll scheduler with separate goroutines
- [02-02] Buffered channel semaphore for concurrency control (Go idiom, no external deps) - [02-02] Buffered channel semaphore for concurrency control (Go idiom, no external deps)
- [02-02] Devices with no Redis status key assumed potentially online for first backup - [02-02] Devices with no Redis status key assumed potentially online for first backup
- [Phase 03]: Trust poller-provided SHA256 hash (no recompute on backend)
- [Phase 03]: Transit failure causes nak (NATS retry), plaintext never stored as fallback
### Pending Todos ### Pending Todos
@@ -79,6 +82,6 @@ None yet.
## Session Continuity ## Session Continuity
Last session: 2026-03-13T01:55:37Z Last session: 2026-03-13T02:48:59.034Z
Stopped at: Completed 02-02-PLAN.md (Phase 2 complete) Stopped at: Completed 03-01-PLAN.md
Resume file: Next phase (03) Resume file: None

View File

@@ -0,0 +1,108 @@
---
phase: 03-snapshot-ingestion
plan: 01
subsystem: api
tags: [nats, jetstream, openbao, transit, encryption, postgresql, prometheus, dedup]
# Dependency graph
requires:
- phase: 01-database-schema
provides: RouterConfigSnapshot model and router_config_snapshots table
- phase: 02-poller-config-collection
provides: Go poller publishes config.snapshot.> NATS messages
provides:
- NATS subscriber consuming config.snapshot.> messages
- SHA256 dedup preventing duplicate snapshot storage
- OpenBao Transit encryption of config text before INSERT
- Prometheus metrics for ingestion monitoring
affects: [04-diff-engine, snapshot-api, config-timeline]
# Tech tracking
tech-stack:
added: [prometheus_client]
patterns: [nats-subscriber-with-dedup, transit-encrypt-before-insert]
key-files:
created:
- backend/app/services/config_snapshot_subscriber.py
- backend/tests/test_config_snapshot_subscriber.py
modified:
- backend/app/main.py
key-decisions:
- "Trust poller-provided SHA256 hash (no recompute on backend)"
- "Raw SQL for dedup SELECT and INSERT (consistent with nats_subscriber.py pattern)"
- "OpenBao Transit service instantiated per-message with close() for connection hygiene"
patterns-established:
- "Config snapshot ingestion: dedup by SHA256 -> encrypt -> INSERT -> ack"
- "Transit failure causes nak (NATS retry), plaintext never stored as fallback"
requirements-completed: [STOR-02]
# Metrics
duration: 4min
completed: 2026-03-13
---
# Phase 3 Plan 1: Config Snapshot Subscriber Summary
**NATS subscriber ingesting config snapshots with SHA256 dedup, OpenBao Transit encryption, and Prometheus metrics**
## Performance
- **Duration:** 4 min
- **Started:** 2026-03-13T02:44:01Z
- **Completed:** 2026-03-13T02:48:08Z
- **Tasks:** 2
- **Files modified:** 3
## Accomplishments
- NATS subscriber consuming config.snapshot.> on DEVICE_EVENTS stream with durable consumer
- SHA256 dedup: duplicate snapshots silently skipped at debug level with Prometheus counter
- OpenBao Transit encryption: plaintext never stored in PostgreSQL, Transit failure causes nak
- Malformed and orphan device messages acked and discarded safely with warning logs
- 6 unit tests covering all handler paths (new, duplicate, encrypt fail, malformed, orphan, first)
- Wired into main.py lifespan with non-fatal startup pattern
## Task Commits
Each task was committed atomically:
1. **Task 1 (RED): Failing tests** - `9d82741` (test)
2. **Task 1 (GREEN): Config snapshot subscriber** - `3ab9f27` (feat)
3. **Task 2: Wire into main.py lifespan** - `0db0641` (feat)
_TDD task had RED + GREEN commits_
## Files Created/Modified
- `backend/app/services/config_snapshot_subscriber.py` - NATS subscriber with dedup, encryption, metrics
- `backend/tests/test_config_snapshot_subscriber.py` - 6 unit tests for all handler paths
- `backend/app/main.py` - Lifespan wiring for start/stop
## Decisions Made
- Trust poller-provided SHA256 hash (no recompute on backend) -- per project decision
- Raw SQL for dedup SELECT and INSERT -- consistent with existing nats_subscriber.py pattern
- OpenBao Transit service instantiated per-message with close() -- connection hygiene
- config_text never appears in any log statement -- contains passwords and keys
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None.
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- Config snapshot subscriber ready to receive messages from Go poller
- RouterConfigSnapshot rows will be available for diff engine (Phase 4)
- Prometheus metrics exposed for monitoring ingestion rate and errors
---
*Phase: 03-snapshot-ingestion*
*Completed: 2026-03-13*