Commit Graph

57 Commits

Author SHA1 Message Date
Jason Staack
c3ae48eb0c feat(15-02): add trend detection and alert evaluation scheduled tasks
- Create trend_detector.py: hourly 7d vs 14d signal comparison per active link
- Create alert_evaluator_site.py: 5-min evaluation of 4 rule types with hysteresis
- Wire both tasks into lifespan with non-fatal startup and cancel on shutdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 07:16:06 -05:00
Jason Staack
d4cf36b200 feat(15-01): add site alert rules/events migration, models, schemas, and config
- Create Alembic migration 035 with site_alert_rules and site_alert_events tables, RLS policies, and GRANT
- Add SiteAlertRule/SiteAlertEvent ORM models with enums for rule_type, severity, state
- Add Pydantic schemas for rule/event CRUD and signal history points
- Add SIGNAL_DEGRADATION_THRESHOLD_DB, ALERT_EVALUATION_INTERVAL_SECONDS, TREND_DETECTION_INTERVAL_SECONDS to Settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 07:16:05 -05:00
Jason Staack
430cab98a8 feat(14-01): add site_id device filter, wireless data endpoints, and frontend API clients
- Add site_id and sector_id query parameters to devices list endpoint
- Add get_device_registrations and get_device_rf_stats to link_service
- Add RegistrationResponse, RFStatsResponse schemas to link.py
- Add /registrations and /rf-stats endpoints to links router
- Add sectorsApi frontend client (list, create, update, delete, assignDevice)
- Add wirelessApi frontend client (links, registrations, RF stats, unknown clients)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:42:08 -05:00
Jason Staack
ea5afe3408 feat(14-01): add sector CRUD backend with migration, model, service, and router
- Create sectors table migration (034) with RLS and devices.sector_id FK
- Add Sector ORM model with site_id and tenant_id foreign keys
- Add SectorCreate/Update/Response/ListResponse Pydantic schemas
- Implement sector_service with CRUD and device assignment functions
- Add sectors router with GET/POST/PUT/DELETE and device sector assignment
- Register sectors router in main.py
- Add sector_id and sector_name to Device model and DeviceResponse

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:40:44 -05:00
Jason Staack
0434d31030 feat(13-03): add link service, schemas, router, and wire subscribers into lifespan
- LinkResponse/UnknownClientResponse Pydantic schemas with from_attributes
- Link service with get_links, get_device_links, get_site_links, get_unknown_clients
- Unknown clients query uses DISTINCT ON for latest registration per MAC
- 4 REST endpoints: tenant links, device links, site links, unknown clients
- Interface and link discovery subscribers wired into FastAPI lifespan start/stop
- Links router registered at /api prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:12:06 -05:00
Jason Staack
3209a7d9be feat(13-03): add interface and link discovery NATS subscribers
- Interface subscriber consumes device.interfaces.> from DEVICE_EVENTS, upserts device_interfaces table
- Link discovery subscriber consumes wireless.registrations.> with separate durable consumer
- MAC resolution against device_interfaces for AP-CPE link discovery
- State machine: active (signal >= -80dBm), degraded (< -80), down (3 missed), stale (24h)
- missed_polls resets to 0 on any observation, enabling link revival

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:10:17 -05:00
Jason Staack
a71df2af29 feat(13-02): add wireless_links table migration, ORM model, register both models
- Migration 033 creates wireless_links with state machine, missed_polls, RLS
- WirelessLink model with LinkState enum (discovered/active/degraded/down/stale)
- Register DeviceInterface, WirelessLink, LinkState in models __init__

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:02:14 -05:00
Jason Staack
7147b15e13 feat(13-02): add device_interfaces table migration and ORM model
- Migration 032 creates device_interfaces with RLS, MAC index, unique(device_id, name)
- DeviceInterface SQLAlchemy model with all columns and device relationship

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:01:22 -05:00
Jason Staack
390c4c1297 feat(12-02): add NATS subscriber for wireless registrations and wire into lifespan
- wireless_registration_subscriber.py: consumes wireless.registrations.> from WIRELESS_REGISTRATIONS stream
- Inserts per-client rows into wireless_registrations hypertable
- Inserts RF monitor data into rf_monitor_stats hypertable
- Uses AdminAsyncSessionLocal to bypass RLS for cross-tenant writes
- Durable consumer: api-wireless-reg-consumer with retry logic
- Wired into FastAPI lifespan with non-fatal startup and graceful shutdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 05:37:12 -05:00
Jason Staack
ddb2b3e43a feat(11-03): add site_id and site_name to DeviceResponse
- Add site_id (Optional[UUID]) and site_name (Optional[str]) to backend DeviceResponse schema
- Include site fields in _build_device_response helper
- Add selectinload(Device.site) to _device_with_relations for eager loading
- Add site_id and site_name to frontend DeviceResponse interface

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:50:57 -05:00
Jason Staack
7afd918e2f feat(11-01): create site service, router, and wire into app
- Add site_service with CRUD, health rollup, device assignment functions
- Add sites router with 8 endpoints (CRUD + assign/unassign/bulk-assign)
- RBAC: viewer for reads, operator for writes, tenant_admin for delete
- Wire sites_router into main.py with /api prefix
- Health rollup computes device_count, online_count, online_percent per site

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:38:54 -05:00
Jason Staack
f7e678532c feat(11-01): create sites table migration, model, and schemas
- Add migration 030 with sites table, RLS policy, and device site_id FK
- Add Site SQLAlchemy model with tenant isolation
- Add site_id nullable FK and relationship to Device model
- Add sites relationship to Tenant model
- Register Site in models __init__.py
- Add SiteCreate, SiteUpdate, SiteResponse, SiteListResponse schemas

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:37:08 -05:00
Jason Staack
6713a8cf5b feat(audit): make device names clickable in audit log
Add device_id to the audit log API response and frontend type, then
use DeviceLink to make device hostnames navigable in AuditLogTable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 11:16:21 -05:00
Jason Staack
1800330545 feat: expand config editor menu tree and add WiFi wave2 template
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:27:38 -05:00
Jason Staack
091c19c434 fix: remove unreachable kms_service import in notification_service
kms_service.py does not exist and Transit encryption was never
implemented for SMTP passwords, making the decrypt_transit code path
unreachable. Remove it entirely and leave only the Fernet fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:15:39 -05:00
Jason Staack
14ff8a54ca fix: add logging to silent error handlers, check maintenance windows for online events
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:09:30 -05:00
Jason Staack
7a563fecd2 fix: resolve ruff lint and formatting issues
Remove unused timedelta import from test_wireless_api.py and
auto-format metrics.py to pass ruff format check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 20:09:14 -05:00
Jason Staack
8bffe3b4d0 feat: add wireless-issues API endpoints for dashboard
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 20:03:36 -05:00
Jason Staack
7ef849550c feat: seed default wireless alert rules on tenant creation 2026-03-15 20:02:00 -05:00
Jason Staack
ac2a09e2bd fix(ci): fix alembic DB import and golangci-lint version
- Move Base to app/models/base.py so alembic env.py can import it
  without triggering engine creation (which connects to hardcoded DB)
- Update all 13 models to import Base from app.models.base
- Pin golangci-lint to latest (supports Go 1.25)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 22:58:39 -05:00
Jason Staack
06a41ca9bf fix(lint): resolve all ruff lint errors
Add ruff config to exclude alembic E402, SQLAlchemy F821, and pre-existing
E501 line-length issues. Auto-fix 69 unused imports and 2 f-strings without
placeholders. Manually fix 8 unused variables. Apply ruff format to 127 files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:17:50 -05:00
Jason Staack
2ad0367c91 fix(vpn): backport VPN fixes from production debugging
- Fix _commit_and_sync infinite recursion
- Use admin session for subnet_index allocation (bypass RLS)
- Auto-set VPN endpoint from CORS_ORIGINS hostname
- Remove server address field from VPN setup UI
- Add DELETE endpoint and button for VPN config removal
- Add wg-reload watcher for reliable config hot-reload via wg syncconf
- Add wg_status.json writer for live peer handshake status in UI
- Per-tenant SNAT for poller-to-device routing through VPN
- Restrict VPN→eth0 forwarding to Docker networks only (block exit node abuse)
- Use 10.10.0.0/16 allowed-address in RouterOS commands
- Fix structlog event= conflict (use audit=True)
- Export backup_scheduler proxy for firmware/upgrade imports
2026-03-14 20:59:14 -05:00
Jason Staack
b5f9bf14df fix(vpn): commit before sync_wireguard_config to ensure data visibility
sync_wireguard_config opens its own AdminAsyncSessionLocal connection
which cannot see uncommitted data from the caller's transaction. Add
_commit_and_sync helper that commits first, then regenerates wg0.conf.

Also removes the unused db parameter from sync_wireguard_config.
2026-03-14 16:42:17 -05:00
Jason Staack
b4a7494016 feat(vpn): update API error handling for subnet exhaustion and IP validation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:36:46 -05:00
Jason Staack
17d9d3e00f feat(vpn): regenerate wg0.conf on tenant deletion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:31:33 -05:00
Jason Staack
5e70890d76 feat(vpn): refactor setup_vpn and sync_wireguard_config for multi-tenant isolation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:30:13 -05:00
Jason Staack
93fe935edf feat(vpn): add global server key helpers, subnet allocation, and allowed-IPs validation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:27:35 -05:00
Jason Staack
593323d277 feat(vpn): add subnet_index column and global server keypair migration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:25:09 -05:00
Jason Staack
9b060c5fdf refactor: rename database from mikrotik to tod in backend code 2026-03-14 10:57:20 -05:00
Jason Staack
970501e453 feat: implement Remote WinBox worker, API, frontend integration, OpenBao persistence, and supporting docs 2026-03-14 09:05:14 -05:00
Jason Staack
1a1ceb2cb1 feat(10-01): add audit event logging to config backup operations
- config_snapshot_created event after successful snapshot INSERT
- config_snapshot_skipped_duplicate event on dedup match
- config_diff_generated event after diff INSERT
- config_backup_manual_trigger event on manual trigger success
- All log_action calls wrapped in try/except for safety

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:44:00 -05:00
Jason Staack
4d62bc9499 feat(09-01): wire retention scheduler into application lifespan
- Import start/stop_retention_scheduler in lifespan
- Start scheduler after config snapshot subscriber (non-fatal pattern)
- Stop scheduler during shutdown alongside other cleanup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:34:03 -05:00
Jason Staack
a9f7a45a9b feat(09-01): implement retention cleanup service with configurable retention period
- Add CONFIG_RETENTION_DAYS setting (default 90) to config.py
- Create retention_service.py with cleanup_expired_snapshots (parameterized SQL via make_interval)
- APScheduler IntervalTrigger runs cleanup every 24h with 1h jitter
- Prometheus counter and histogram for observability
- CASCADE FKs handle diff/change deletion automatically
- All 4 unit tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:33:27 -05:00
Jason Staack
af7007df13 feat(06-02): add snapshot view and diff retrieval endpoints
- GET /config/{snapshot_id} returns decrypted full config with RBAC
- GET /config/{snapshot_id}/diff returns unified diff text with RBAC
- 404 for missing snapshots/diffs, 500 for Transit decrypt failure
- Both endpoints enforce viewer+ role and config:read scope

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:03:32 -05:00
Jason Staack
83cd661efc feat(06-02): add get_snapshot and get_snapshot_diff service functions
- get_snapshot queries snapshot by id/device/tenant, decrypts via Transit
- get_snapshot_diff queries diff by new_snapshot_id with device/tenant filter
- Both return None for missing data (404-safe)
- 4 new tests with mocked Transit and DB sessions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:02:58 -05:00
Jason Staack
5c56344d74 feat(06-01): add config-history endpoint with RBAC and main.py registration
- GET /api/tenants/{tid}/devices/{did}/config-history endpoint
- Viewer+ RBAC with config:read scope
- Pagination via limit/offset query params (defaults 50/0)
- Router registered in main.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:59:37 -05:00
Jason Staack
f7d5aec4ec feat(06-01): add config history service with TDD tests
- Service queries router_config_changes JOIN router_config_diffs for timeline
- Returns paginated entries with component, summary, timestamp, diff metadata
- ORDER BY created_at DESC with limit/offset pagination
- 4 tests covering formatting, empty results, pagination, and ordering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:58:51 -05:00
Jason Staack
122b5917f4 feat(05-02): wire change parser into diff service with RETURNING id
- Diff INSERT now uses RETURNING id to capture diff_id
- parse_diff_changes called after diff commit, results stored in router_config_changes
- Change parser errors are best-effort (logged, never block diff storage)
- Added tests for change storage and parser error resilience

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:37:09 -05:00
Jason Staack
b167831105 feat(05-02): implement config change parser for RouterOS diffs
- parse_diff_changes() extracts component, summary, raw_line from unified diffs
- RouterOS path detection converts /ip firewall filter to ip/firewall/filter
- Human-readable summaries: Added/Removed/Modified N component rules
- Fallback to system/general when no path headers found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:35:48 -05:00
Jason Staack
eb76343d04 feat(05-01): wire diff generation into snapshot subscriber
- Add RETURNING id to snapshot INSERT for new_snapshot_id capture
- Call generate_and_store_diff after successful commit (best-effort)
- Outer try/except safety net ensures snapshot ack never blocked by diff
- Update subscriber tests to mock diff service

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:32:40 -05:00
Jason Staack
72d0ae2856 feat(05-01): implement config diff service with Transit decrypt and difflib
- generate_and_store_diff decrypts old+new snapshots, produces unified diff
- Stores diff in router_config_diffs with line counts
- Best-effort: decrypt/DB errors logged, never raised
- Prometheus metrics: generated_total, errors_total, duration_seconds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:31:28 -05:00
Jason Staack
00f0a8b507 feat(04-01): add config snapshot trigger endpoint with NATS request-reply
- POST /tenants/{tid}/devices/{did}/config-snapshot/trigger endpoint
- Requires operator role, rate limited 10/minute
- Returns 201 success, 404 device not found, 409 lock held, 502 failure, 504 timeout
- Reuses NATS connection from routeros_proxy module
- 6 tests covering all response paths including connection errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:25 -05:00
Jason Staack
0db06419e7 feat(03-01): wire config snapshot subscriber into main.py lifespan
- Start config_snapshot_subscriber in lifespan startup (non-fatal)
- Stop config_snapshot_subscriber in lifespan shutdown
- Placed after push_rollback_subscriber (near config-related subscribers)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:47:51 -05:00
Jason Staack
3ab9f27d49 feat(03-01): implement config snapshot subscriber with dedup and encryption
- NATS subscriber for config.snapshot.> on DEVICE_EVENTS stream
- Dedup by SHA256 hash against latest snapshot per device
- OpenBao Transit encryption before INSERT (plaintext never stored)
- Malformed/orphan messages acked and discarded safely
- Transit failure causes nak for NATS retry
- Prometheus metrics: ingested, dedup_skipped, errors, duration
- All 6 unit tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:47:07 -05:00
Jason Staack
8fe275e6f3 feat(01-01): add RouterConfigSnapshot/Diff/Change ORM models and tests
- Add RouterConfigSnapshot model with Transit ciphertext config_text
  and SHA-256 plaintext hash for deduplication
- Add RouterConfigDiff model for unified diffs between snapshots
- Add RouterConfigChange model for parsed semantic changes
- Export all three from app.models barrel file
- Add unit tests for importability, table names, columns, and types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:03:43 -05:00
Jason Staack
c2eea6847f fix: WinBox tunnel bind address, port range, and proxy support
- Bind tunnel listeners to 0.0.0.0 instead of 127.0.0.1 so tunnels
  are reachable through reverse proxies and container networks
- Reduce port range to 49000-49004 (5 concurrent tunnels)
- Derive WinBox URI host from request Host header instead of
  hardcoding 127.0.0.1, enabling use behind reverse proxies
- Add README security warning about default encryption keys

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 19:03:53 -05:00
Jason Staack
acf1790bed feat: add audit.session.end NATS pipeline for SSH session tracking
Poller publishes session end events via JetStream when SSH sessions
close (normal disconnect or idle timeout). Backend subscribes with a
durable consumer and writes ssh_session_end audit log entries with
duration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 16:07:10 -05:00
Jason Staack
7aaaeaa1d1 fix: address spec compliance gaps - tenant check, XFF fallback, rate limiting
- Gap 1: Add tenant ID verification after device lookup in SSH relay handleSSH,
  closing cross-tenant token reuse vulnerability
- Gap 2: Add X-Forwarded-For fallback (last entry) when X-Real-IP is absent in
  SSH relay source IP extraction; import strings package
- Gap 3: Add @limiter.limit("10/minute") to POST /winbox-session and POST
  /ssh-session using existing slowapi pattern from app.middleware.rate_limit
- Gap 4: Add TODO comment in open_ssh_session explaining that SSH session count
  enforcement is at the poller level; no NATS subject exists yet for API-side
  pre-check

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:51:14 -05:00
Jason Staack
4860fad643 feat(api): add remote access endpoints for WinBox tunnels and SSH sessions
Implements four operator-gated endpoints under /api/tenants/{tenant_id}/devices/{device_id}/:
- POST /winbox-session: opens a WinBox tunnel via NATS request-reply to poller
- POST /ssh-session: mints a single-use Redis token (120s TTL) for WebSocket SSH relay
- DELETE /winbox-session/{tunnel_id}: idempotently closes a WinBox tunnel
- GET /sessions: lists active WinBox tunnels via NATS tunnel.status.list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:39:24 -05:00
Jason Staack
63fa45ffdd feat(api): add remote access pydantic schemas 2026-03-12 15:36:36 -05:00