- Import start/stop_retention_scheduler in lifespan
- Start scheduler after config snapshot subscriber (non-fatal pattern)
- Stop scheduler during shutdown alongside other cleanup
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CONFIG_RETENTION_DAYS setting (default 90) to config.py
- Create retention_service.py with cleanup_expired_snapshots (parameterized SQL via make_interval)
- APScheduler IntervalTrigger runs cleanup every 24h with 1h jitter
- Prometheus counter and histogram for observability
- CASCADE FKs handle diff/change deletion automatically
- All 4 unit tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test cleanup deletes expired snapshots
- Test snapshots within retention window are kept
- Test deleted count is returned
- Test empty table handled gracefully
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- GET /config/{snapshot_id} returns decrypted full config with RBAC
- GET /config/{snapshot_id}/diff returns unified diff text with RBAC
- 404 for missing snapshots/diffs, 500 for Transit decrypt failure
- Both endpoints enforce viewer+ role and config:read scope
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- get_snapshot queries snapshot by id/device/tenant, decrypts via Transit
- get_snapshot_diff queries diff by new_snapshot_id with device/tenant filter
- Both return None for missing data (404-safe)
- 4 new tests with mocked Transit and DB sessions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- GET /api/tenants/{tid}/devices/{did}/config-history endpoint
- Viewer+ RBAC with config:read scope
- Pagination via limit/offset query params (defaults 50/0)
- Router registered in main.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Service queries router_config_changes JOIN router_config_diffs for timeline
- Returns paginated entries with component, summary, timestamp, diff metadata
- ORDER BY created_at DESC with limit/offset pagination
- 4 tests covering formatting, empty results, pagination, and ordering
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Diff INSERT now uses RETURNING id to capture diff_id
- parse_diff_changes called after diff commit, results stored in router_config_changes
- Change parser errors are best-effort (logged, never block diff storage)
- Added tests for change storage and parser error resilience
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- parse_diff_changes() extracts component, summary, raw_line from unified diffs
- RouterOS path detection converts /ip firewall filter to ip/firewall/filter
- Human-readable summaries: Added/Removed/Modified N component rules
- Fallback to system/general when no path headers found
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add RETURNING id to snapshot INSERT for new_snapshot_id capture
- Call generate_and_store_diff after successful commit (best-effort)
- Outer try/except safety net ensures snapshot ack never blocked by diff
- Update subscriber tests to mock diff service
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test success returns 201 with sha256_hash
- Test NATS timeout returns 504
- Test poller failure returns 502
- Test device not found returns 404
- Test lock contention returns 409
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Start config_snapshot_subscriber in lifespan startup (non-fatal)
- Stop config_snapshot_subscriber in lifespan shutdown
- Placed after push_rollback_subscriber (near config-related subscribers)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- NATS subscriber for config.snapshot.> on DEVICE_EVENTS stream
- Dedup by SHA256 hash against latest snapshot per device
- OpenBao Transit encryption before INSERT (plaintext never stored)
- Malformed/orphan messages acked and discarded safely
- Transit failure causes nak for NATS retry
- Prometheus metrics: ingested, dedup_skipped, errors, duration
- All 6 unit tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create router_config_snapshots table with Transit ciphertext storage
- Create router_config_diffs table with snapshot pair FK references
- Create router_config_changes table for parsed semantic changes
- Add RLS tenant isolation (ENABLE + FORCE + USING + WITH CHECK) on all 3
- Add GRANT SELECT/INSERT/DELETE to app_user on all 3
- Add performance indexes: device+collected_at, device+hash, snapshot pair, diff_id
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add RouterConfigSnapshot model with Transit ciphertext config_text
and SHA-256 plaintext hash for deduplication
- Add RouterConfigDiff model for unified diffs between snapshots
- Add RouterConfigChange model for parsed semantic changes
- Export all three from app.models barrel file
- Add unit tests for importability, table names, columns, and types
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bind tunnel listeners to 0.0.0.0 instead of 127.0.0.1 so tunnels
are reachable through reverse proxies and container networks
- Reduce port range to 49000-49004 (5 concurrent tunnels)
- Derive WinBox URI host from request Host header instead of
hardcoding 127.0.0.1, enabling use behind reverse proxies
- Add README security warning about default encryption keys
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Poller publishes session end events via JetStream when SSH sessions
close (normal disconnect or idle timeout). Backend subscribes with a
durable consumer and writes ssh_session_end audit log entries with
duration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Gap 1: Add tenant ID verification after device lookup in SSH relay handleSSH,
closing cross-tenant token reuse vulnerability
- Gap 2: Add X-Forwarded-For fallback (last entry) when X-Real-IP is absent in
SSH relay source IP extraction; import strings package
- Gap 3: Add @limiter.limit("10/minute") to POST /winbox-session and POST
/ssh-session using existing slowapi pattern from app.middleware.rate_limit
- Gap 4: Add TODO comment in open_ssh_session explaining that SSH session count
enforcement is at the poller level; no NATS subject exists yet for API-side
pre-check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements four operator-gated endpoints under /api/tenants/{tenant_id}/devices/{device_id}/:
- POST /winbox-session: opens a WinBox tunnel via NATS request-reply to poller
- POST /ssh-session: mints a single-use Redis token (120s TTL) for WebSocket SSH relay
- DELETE /winbox-session/{tunnel_id}: idempotently closes a WinBox tunnel
- GET /sessions: lists active WinBox tunnels via NATS tunnel.status.list
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two bugs fixed:
1. audit_service.py: log_action() inserted into audit_logs using the
caller's DB session but never committed. Any router that called
db.commit() before log_action() (firmware, devices, config_editor,
alerts, certificates) had its audit rows silently rolled back when
the request session closed.
Fix: log_action now opens its own AdminAsyncSessionLocal and self-
commits, making audit persistence independent of the caller's
transaction. The 'db' parameter is kept for backward compat but
unused. Affects 5 routers (firmware, devices, config_editor,
alerts, certificates).
2. docker-compose.override.yml: /data/firmware-cache had no volume
mount so the directory didn't exist in the container, causing
firmware downloads to fail with Permission denied.
Fix: bind-mount docker-data/firmware-cache:/data/firmware-cache
so firmware images survive container restarts.
Three bugs fixed:
1. Phase 30 (auth.ts): After SRP login the encrypted_key_set was returned
from the server but the vault key and RSA private key were never unwrapped
with the AUK. keyStore.getVaultKey() was always null, causing Tier 1
config-backup diffs to crash with a TypeError.
Fix: unwrap vault key and private key using crypto.subtle.unwrapKey after
successful SRP verification. Non-fatal: warns to console if decryption
fails so login always succeeds.
2. Token refresh (auth.py): The /refresh endpoint required refresh_token in
the request body, but the frontend never stored or sent it. After the 15-
minute access token TTL, all authenticated API calls would fail silently
because the interceptor sent an empty body and received 422 (not 401),
so the retry loop never fired.
Fix: login/srpVerify now set an httpOnly refresh_token cookie scoped to
/api/auth/refresh. The refresh endpoint now accepts the token from either
cookie (preferred) or body (legacy). Logout clears both cookies.
RefreshRequest.refresh_token is now Optional to allow empty-body calls.
3. Silent token rotation: the /refresh endpoint now also rotates the refresh
token cookie on each use (issues a fresh token), reducing the window for
stolen refresh token replay.
execute_cli was passing the full CLI string (e.g. '/ping address=8.8.8.8
count=4') as a single command to the Go poller. go-routeros expects the
command path and args separately. Now splits into command + prefixed args.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CurrentUser object uses user_id attribute, not id. Caused AttributeError
on PUT /api/settings/smtp.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When use_tls=false, the old logic set start_tls=true for any port != 25,
which broke plain SMTP servers like Mailpit. Now:
- Port 465: implicit TLS
- use_tls=true on other ports: STARTTLS
- use_tls=false: plain SMTP (no TLS)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>