- Diff INSERT now uses RETURNING id to capture diff_id
- parse_diff_changes called after diff commit, results stored in router_config_changes
- Change parser errors are best-effort (logged, never block diff storage)
- Added tests for change storage and parser error resilience
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- parse_diff_changes() extracts component, summary, raw_line from unified diffs
- RouterOS path detection converts /ip firewall filter to ip/firewall/filter
- Human-readable summaries: Added/Removed/Modified N component rules
- Fallback to system/general when no path headers found
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add RETURNING id to snapshot INSERT for new_snapshot_id capture
- Call generate_and_store_diff after successful commit (best-effort)
- Outer try/except safety net ensures snapshot ack never blocked by diff
- Update subscriber tests to mock diff service
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test success returns 201 with sha256_hash
- Test NATS timeout returns 504
- Test poller failure returns 502
- Test device not found returns 404
- Test lock contention returns 409
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create BackupResponder for NATS request-reply on config.backup.trigger
- Extract public CollectAndPublish from BackupScheduler returning sha256 hash
- Define BackupExecutor/BackupLocker/DeviceGetter interfaces for testability
- Create RedisBackupLocker adapter wrapping redislock.Client
- Wire BackupResponder into main.go lifecycle
- All 6 tests pass with in-process NATS server
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test subscribe registers subscription
- Test valid request returns success with sha256_hash
- Test lock held returns locked status
- Test invalid JSON returns error
- Test Stop unsubscribes cleanly
- Test device not found returns failed status
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Start config_snapshot_subscriber in lifespan startup (non-fatal)
- Stop config_snapshot_subscriber in lifespan shutdown
- Placed after push_rollback_subscriber (near config-related subscribers)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- NATS subscriber for config.snapshot.> on DEVICE_EVENTS stream
- Dedup by SHA256 hash against latest snapshot per device
- OpenBao Transit encryption before INSERT (plaintext never stored)
- Malformed/orphan messages acked and discarded safely
- Transit failure causes nak for NATS retry
- Prometheus metrics: ingested, dedup_skipped, errors, duration
- All 6 unit tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create BackupScheduler with all dependencies injected
- Run as goroutine parallel to status poll scheduler
- Shares same context for graceful shutdown via SIGINT/SIGTERM
- Startup logged with interval, max_concurrent, command_timeout
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- BackupScheduler manages per-device backup goroutines independently from status poll
- First backup uses 30-300s random jitter delay to spread load
- Concurrency limited by buffered channel semaphore (configurable max)
- Per-device Redis lock prevents duplicate backups across pods
- Auth failures and host key mismatches block retries with clear warnings
- Transient errors use 5m/15m/1h exponential backoff with cap
- Offline devices skipped via Redis status key check
- TOFU fingerprint stored on first successful SSH connection
- Config output validated, normalized, hashed, published to NATS
- SSHHostKeyUpdater interface added to interfaces.go
- All 12 backup unit tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two plans covering SSH executor, config normalization, NATS publishing,
backup scheduler, and main.go wiring for periodic RouterOS config backup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create router_config_snapshots table with Transit ciphertext storage
- Create router_config_diffs table with snapshot pair FK references
- Create router_config_changes table for parsed semantic changes
- Add RLS tenant isolation (ENABLE + FORCE + USING + WITH CHECK) on all 3
- Add GRANT SELECT/INSERT/DELETE to app_user on all 3
- Add performance indexes: device+collected_at, device+hash, snapshot pair, diff_id
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add RouterConfigSnapshot model with Transit ciphertext config_text
and SHA-256 plaintext hash for deduplication
- Add RouterConfigDiff model for unified diffs between snapshots
- Add RouterConfigChange model for parsed semantic changes
- Export all three from app.models barrel file
- Add unit tests for importability, table names, columns, and types
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bind tunnel listeners to 0.0.0.0 instead of 127.0.0.1 so tunnels
are reachable through reverse proxies and container networks
- Reduce port range to 49000-49004 (5 concurrent tunnels)
- Derive WinBox URI host from request Host header instead of
hardcoding 127.0.0.1, enabling use behind reverse proxies
- Add README security warning about default encryption keys
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WinBox tunnels, SSH terminal, NATS request-reply architecture,
session management, security notes, and updated port tables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Poller publishes session end events via JetStream when SSH sessions
close (normal disconnect or idle timeout). Backend subscribes with a
durable consumer and writes ssh_session_end audit log entries with
duration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Gap 1: Add tenant ID verification after device lookup in SSH relay handleSSH,
closing cross-tenant token reuse vulnerability
- Gap 2: Add X-Forwarded-For fallback (last entry) when X-Real-IP is absent in
SSH relay source IP extraction; import strings package
- Gap 3: Add @limiter.limit("10/minute") to POST /winbox-session and POST
/ssh-session using existing slowapi pattern from app.middleware.rate_limit
- Gap 4: Add TODO comment in open_ssh_session explaining that SSH session count
enforcement is at the poller level; no NATS subject exists yet for API-side
pre-check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add WebSocket upgrade map to nginx and proxy /ws/ssh to poller:8080
- Update CSP connect-src to allow ws: and wss: for terminal connections
- Add tunnel port range 49000-49100, SSH relay env vars, ulimits, and healthcheck to poller in both override and prod compose files
- Increase poller memory limit to 512M in prod for tunnel/SSH overhead
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements four operator-gated endpoints under /api/tenants/{tenant_id}/devices/{device_id}/:
- POST /winbox-session: opens a WinBox tunnel via NATS request-reply to poller
- POST /ssh-session: mints a single-use Redis token (120s TTL) for WebSocket SSH relay
- DELETE /winbox-session/{tunnel_id}: idempotently closes a WinBox tunnel
- GET /sessions: lists active WinBox tunnels via NATS tunnel.status.list
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TunnelManager, TunnelResponder, SSH relay server, and SSH relay HTTP
server to the poller startup sequence with env-configurable port ranges,
idle timeouts, and session limits. Extends graceful shutdown to cover the
HTTP server (5s context), tunnel manager, and SSH relay server via defer.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the SSH relay server (Task 2.1) that validates single-use
Redis tokens via GETDEL, dials SSH to the target device with PTY,
and bridges WebSocket binary/text frames to SSH stdin/stdout/stderr
with idle timeout and per-user/per-device session limits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Manager which orchestrates WinBox tunnel lifecycle: open,
close, idle cleanup, and status queries. Uses PortPool and Tunnel from
Tasks 1.2/1.3. DeviceStore and CredentialCache wired in for Task 1.5.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Tunnel type that listens on a local port, accepts WinBox client
connections, dials the remote RouterOS device, and proxies traffic
bidirectionally. Uses activityReader to atomically update LastActive on
each read for idle timeout detection. Per-connection contexts derive from
the tunnel context so Close() terminates all connections cleanly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements PortPool with mutex-protected allocation, bind verification
to skip ports already in use by the OS, and release-for-reuse semantics.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>