Commit Graph

119 Commits

Author SHA1 Message Date
Jason Staack
970501e453 feat: implement Remote WinBox worker, API, frontend integration, OpenBao persistence, and supporting docs 2026-03-14 09:05:14 -05:00
Jason Staack
7af08276ea chore: remove .planning from tracking (already in .gitignore)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 06:55:28 -05:00
Jason Staack
ed3ad8eb17 chore: update about page to v9.6 and Dockerfile to Go 1.25
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 06:54:08 -05:00
Jason Staack
45bdbedfb0 docs(10-01): complete config backup audit events plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:47:30 -05:00
Jason Staack
fb91fed5b9 test(10-01): add tests verifying audit events for config backup operations
- Test config_snapshot_created event on new snapshot
- Test config_snapshot_skipped_duplicate event on dedup match
- Test config_diff_generated event after diff stored
- Test config_backup_manual_trigger event on manual trigger success

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:45:58 -05:00
Jason Staack
1a1ceb2cb1 feat(10-01): add audit event logging to config backup operations
- config_snapshot_created event after successful snapshot INSERT
- config_snapshot_skipped_duplicate event on dedup match
- config_diff_generated event after diff INSERT
- config_backup_manual_trigger event on manual trigger success
- All log_action calls wrapped in try/except for safety

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:44:00 -05:00
Jason Staack
50211d1853 docs(09-01): complete retention cleanup plan
- Create 09-01-SUMMARY.md with execution results
- Update STATE.md with phase 9 position and decisions
- Update ROADMAP.md with phase 9 progress
- Mark STOR-03 and STOR-04 requirements complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:35:37 -05:00
Jason Staack
4d62bc9499 feat(09-01): wire retention scheduler into application lifespan
- Import start/stop_retention_scheduler in lifespan
- Start scheduler after config snapshot subscriber (non-fatal pattern)
- Stop scheduler during shutdown alongside other cleanup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:34:03 -05:00
Jason Staack
a9f7a45a9b feat(09-01): implement retention cleanup service with configurable retention period
- Add CONFIG_RETENTION_DAYS setting (default 90) to config.py
- Create retention_service.py with cleanup_expired_snapshots (parameterized SQL via make_interval)
- APScheduler IntervalTrigger runs cleanup every 24h with 1h jitter
- Prometheus counter and histogram for observability
- CASCADE FKs handle diff/change deletion automatically
- All 4 unit tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:33:27 -05:00
Jason Staack
00bdde9975 test(09-01): add failing tests for retention cleanup service
- Test cleanup deletes expired snapshots
- Test snapshots within retention window are kept
- Test deleted count is returned
- Test empty table handled gracefully

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:32:20 -05:00
Jason Staack
be41add4e9 feat(08-02): add snapshot download button to config history timeline
- Add SnapshotResponse interface and getSnapshot API method
- Add deviceName prop to ConfigHistorySection
- Add download handler that fetches snapshot and triggers .rsc file download
- Add Download icon button on each timeline entry with stopPropagation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:23:55 -05:00
Jason Staack
8a64596d8b docs(08-01): complete diff viewer plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:22:22 -05:00
Jason Staack
2cf426fa63 feat(08-01): wire diff viewer into config history timeline
- Add click handlers to timeline entries to open diff viewer
- Render DiffViewer inline above timeline when snapshot selected
- Add hover state and cursor-pointer to timeline entries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:20:52 -05:00
Jason Staack
dda00fbd23 feat(08-01): add diff viewer component and API client
- Add DiffResponse interface and getDiff method to configHistoryApi
- Create DiffViewer component with unified diff rendering
- Green highlighting for added lines, red for removed lines
- Blue styling for hunk headers, loading skeleton, error state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:20:24 -05:00
Jason Staack
bd09590c01 docs(07-01): complete config history UI timeline plan
- SUMMARY.md with plan execution results
- STATE.md updated to phase 7 complete
- ROADMAP.md and REQUIREMENTS.md updated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:13:36 -05:00
Jason Staack
36861fffea feat(07-01): wire ConfigHistorySection into device detail page
- Import and render ConfigHistorySection below Interface Utilization
- Configuration history now visible on device overview tab

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:12:16 -05:00
Jason Staack
6bd24517ba feat(07-01): add config history API client and timeline component
- Add ConfigChangeEntry interface and configHistoryApi.list() to api.ts
- Create ConfigHistorySection with timeline, loading skeleton, and empty state
- Poll every 60s via TanStack Query refetchInterval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:11:46 -05:00
Jason Staack
e8bf994e7d docs(06-02): complete snapshot view and diff retrieval plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:04:48 -05:00
Jason Staack
af7007df13 feat(06-02): add snapshot view and diff retrieval endpoints
- GET /config/{snapshot_id} returns decrypted full config with RBAC
- GET /config/{snapshot_id}/diff returns unified diff text with RBAC
- 404 for missing snapshots/diffs, 500 for Transit decrypt failure
- Both endpoints enforce viewer+ role and config:read scope

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:03:32 -05:00
Jason Staack
83cd661efc feat(06-02): add get_snapshot and get_snapshot_diff service functions
- get_snapshot queries snapshot by id/device/tenant, decrypts via Transit
- get_snapshot_diff queries diff by new_snapshot_id with device/tenant filter
- Both return None for missing data (404-safe)
- 4 new tests with mocked Transit and DB sessions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:02:58 -05:00
Jason Staack
7c0aa1b359 docs(06-01): complete config history timeline plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:01:12 -05:00
Jason Staack
5c56344d74 feat(06-01): add config-history endpoint with RBAC and main.py registration
- GET /api/tenants/{tid}/devices/{did}/config-history endpoint
- Viewer+ RBAC with config:read scope
- Pagination via limit/offset query params (defaults 50/0)
- Router registered in main.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:59:37 -05:00
Jason Staack
f7d5aec4ec feat(06-01): add config history service with TDD tests
- Service queries router_config_changes JOIN router_config_diffs for timeline
- Returns paginated entries with component, summary, timestamp, diff metadata
- ORDER BY created_at DESC with limit/offset pagination
- 4 tests covering formatting, empty results, pagination, and ordering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:58:51 -05:00
Jason Staack
3416cc90a5 docs(05-02): complete structured change parser plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:38:16 -05:00
Jason Staack
122b5917f4 feat(05-02): wire change parser into diff service with RETURNING id
- Diff INSERT now uses RETURNING id to capture diff_id
- parse_diff_changes called after diff commit, results stored in router_config_changes
- Change parser errors are best-effort (logged, never block diff storage)
- Added tests for change storage and parser error resilience

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:37:09 -05:00
Jason Staack
b167831105 feat(05-02): implement config change parser for RouterOS diffs
- parse_diff_changes() extracts component, summary, raw_line from unified diffs
- RouterOS path detection converts /ip firewall filter to ip/firewall/filter
- Human-readable summaries: Added/Removed/Modified N component rules
- Fallback to system/general when no path headers found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:35:48 -05:00
Jason Staack
7fddf35fc5 test(05-02): add failing tests for config change parser
- 6 tests covering component extraction, summaries, multi-section, removals, modifications, fallback, raw_line

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:35:19 -05:00
Jason Staack
4e083a9606 docs(05-01): complete config diff service plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:34:16 -05:00
Jason Staack
eb76343d04 feat(05-01): wire diff generation into snapshot subscriber
- Add RETURNING id to snapshot INSERT for new_snapshot_id capture
- Call generate_and_store_diff after successful commit (best-effort)
- Outer try/except safety net ensures snapshot ack never blocked by diff
- Update subscriber tests to mock diff service

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:32:40 -05:00
Jason Staack
72d0ae2856 feat(05-01): implement config diff service with Transit decrypt and difflib
- generate_and_store_diff decrypts old+new snapshots, produces unified diff
- Stores diff in router_config_diffs with line counts
- Best-effort: decrypt/DB errors logged, never raised
- Prometheus metrics: generated_total, errors_total, duration_seconds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:31:28 -05:00
Jason Staack
79453fa115 test(05-01): add failing tests for config diff service
- 5 tests: diff generation, first snapshot skip, decrypt failure, line counts, empty diff skip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:30:52 -05:00
Jason Staack
db5bb3fa96 docs(04-01): complete manual backup trigger plan
- Summary with 12 tests (6 Go, 6 Python), all passing
- STATE.md updated: Phase 4 complete, decisions logged
- ROADMAP.md updated: Phase 4 plan progress
- REQUIREMENTS.md: COLL-04 marked complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:12:33 -05:00
Jason Staack
00f0a8b507 feat(04-01): add config snapshot trigger endpoint with NATS request-reply
- POST /tenants/{tid}/devices/{did}/config-snapshot/trigger endpoint
- Requires operator role, rate limited 10/minute
- Returns 201 success, 404 device not found, 409 lock held, 502 failure, 504 timeout
- Reuses NATS connection from routeros_proxy module
- 6 tests covering all response paths including connection errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:25 -05:00
Jason Staack
0e664150e7 test(04-01): add failing tests for config snapshot trigger endpoint
- Test success returns 201 with sha256_hash
- Test NATS timeout returns 504
- Test poller failure returns 502
- Test device not found returns 404
- Test lock contention returns 409

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:08:13 -05:00
Jason Staack
0851eced36 feat(04-01): implement BackupResponder with extracted CollectAndPublish
- Create BackupResponder for NATS request-reply on config.backup.trigger
- Extract public CollectAndPublish from BackupScheduler returning sha256 hash
- Define BackupExecutor/BackupLocker/DeviceGetter interfaces for testability
- Create RedisBackupLocker adapter wrapping redislock.Client
- Wire BackupResponder into main.go lifecycle
- All 6 tests pass with in-process NATS server

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:07:35 -05:00
Jason Staack
9e102fda20 test(04-01): add failing tests for BackupResponder
- Test subscribe registers subscription
- Test valid request returns success with sha256_hash
- Test lock held returns locked status
- Test invalid JSON returns error
- Test Stop unsubscribes cleanly
- Test device not found returns failed status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:04:44 -05:00
Jason Staack
bf3fb509ed docs(03-01): complete config snapshot subscriber plan
- SUMMARY.md with task commits and decisions
- STATE.md updated to Phase 3 complete
- ROADMAP.md progress updated
- REQUIREMENTS.md: STOR-02 marked complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:49:43 -05:00
Jason Staack
0db06419e7 feat(03-01): wire config snapshot subscriber into main.py lifespan
- Start config_snapshot_subscriber in lifespan startup (non-fatal)
- Stop config_snapshot_subscriber in lifespan shutdown
- Placed after push_rollback_subscriber (near config-related subscribers)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:47:51 -05:00
Jason Staack
3ab9f27d49 feat(03-01): implement config snapshot subscriber with dedup and encryption
- NATS subscriber for config.snapshot.> on DEVICE_EVENTS stream
- Dedup by SHA256 hash against latest snapshot per device
- OpenBao Transit encryption before INSERT (plaintext never stored)
- Malformed/orphan messages acked and discarded safely
- Transit failure causes nak for NATS retry
- Prometheus metrics: ingested, dedup_skipped, errors, duration
- All 6 unit tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:47:07 -05:00
Jason Staack
9d8274158a test(03-01): add failing tests for config snapshot subscriber
- 6 tests: new snapshot stored, duplicate skipped, encrypt failure naks,
  malformed acked, orphan device acked, first-snapshot stored
- Tests mock NATS msg, AdminAsyncSessionLocal, OpenBaoTransitService

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:45:41 -05:00
Jason Staack
d456fe58e9 docs(02-02): complete backup scheduler plan
- SUMMARY.md with execution metrics and decisions
- STATE.md updated: Phase 2 complete, 3 plans done
- ROADMAP.md updated: Phase 2 marked complete
- REQUIREMENTS.md: COLL-03, COLL-05 marked complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:57:47 -05:00
Jason Staack
d34817a36c feat(02-02): wire BackupScheduler into main.go lifecycle
- Create BackupScheduler with all dependencies injected
- Run as goroutine parallel to status poll scheduler
- Shares same context for graceful shutdown via SIGINT/SIGTERM
- Startup logged with interval, max_concurrent, command_timeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:55:06 -05:00
Jason Staack
2653a32d6f feat(02-02): implement BackupScheduler with per-device goroutines and concurrency control
- BackupScheduler manages per-device backup goroutines independently from status poll
- First backup uses 30-300s random jitter delay to spread load
- Concurrency limited by buffered channel semaphore (configurable max)
- Per-device Redis lock prevents duplicate backups across pods
- Auth failures and host key mismatches block retries with clear warnings
- Transient errors use 5m/15m/1h exponential backoff with cap
- Offline devices skipped via Redis status key check
- TOFU fingerprint stored on first successful SSH connection
- Config output validated, normalized, hashed, published to NATS
- SSHHostKeyUpdater interface added to interfaces.go
- All 12 backup unit tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:54:23 -05:00
Jason Staack
a884b0945d test(02-02): add failing tests for BackupScheduler
- Jitter range, backoff sequence, shouldRetry blocking logic
- Online-only gating via Redis, concurrency semaphore behavior
- Reconciliation start/stop device lifecycle

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:52:29 -05:00
Jason Staack
7ff3178b84 docs(02-01): complete config backup primitives plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:50:27 -05:00
Jason Staack
4ae39d2cb3 feat(02-01): add config backup env vars, NATS event, device SSH fields, migration, metrics
- Config: CONFIG_BACKUP_INTERVAL (21600s), CONFIG_BACKUP_MAX_CONCURRENT (10), CONFIG_BACKUP_COMMAND_TIMEOUT (60s)
- NATS: ConfigSnapshotEvent type, PublishConfigSnapshot method, config.snapshot.> stream subject
- Device: SSHPort/SSHHostKeyFingerprint fields, UpdateSSHHostKey method, updated queries/scans
- Migration 028: ssh_port, ssh_host_key_fingerprint, timestamp columns with poller_user grants
- Metrics: ConfigBackupTotal (counter), ConfigBackupDuration (histogram), ConfigBackupActive (gauge)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:48:12 -05:00
Jason Staack
f1abb75cab feat(02-01): add SSH executor with TOFU host key verification and config normalizer
- SSH RunCommand with typed error classification (auth, hostkey, timeout, connection refused, truncated)
- TOFU host key callback: accept-on-first-connect, verify-on-subsequent, reject-on-mismatch
- NormalizeConfig strips timestamps, normalizes line endings, trims whitespace, collapses blanks
- HashConfig returns 64-char lowercase hex SHA256 of normalized config
- 22 unit tests covering all error kinds, TOFU flows, normalization edge cases, idempotency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:46:04 -05:00
Jason Staack
33f888a6e2 docs(02): create phase plan for poller config collection
Two plans covering SSH executor, config normalization, NATS publishing,
backup scheduler, and main.go wiring for periodic RouterOS config backup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:39:47 -05:00
Jason Staack
a7a17a5ecd feat(01-01): add Alembic migration 027 for config snapshot tables with RLS
- Create router_config_snapshots table with Transit ciphertext storage
- Create router_config_diffs table with snapshot pair FK references
- Create router_config_changes table for parsed semantic changes
- Add RLS tenant isolation (ENABLE + FORCE + USING + WITH CHECK) on all 3
- Add GRANT SELECT/INSERT/DELETE to app_user on all 3
- Add performance indexes: device+collected_at, device+hash, snapshot pair, diff_id

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:04:18 -05:00
Jason Staack
8fe275e6f3 feat(01-01): add RouterConfigSnapshot/Diff/Change ORM models and tests
- Add RouterConfigSnapshot model with Transit ciphertext config_text
  and SHA-256 plaintext hash for deduplication
- Add RouterConfigDiff model for unified diffs between snapshots
- Add RouterConfigChange model for parsed semantic changes
- Export all three from app.models barrel file
- Add unit tests for importability, table names, columns, and types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:03:43 -05:00