Commit Graph

38 Commits

Author SHA1 Message Date
Jason Staack
a231b18d69 feat(17-03): bulk add endpoint and service with credential profile support
- POST /tenants/{tenant_id}/devices/bulk endpoint with rate limiting
- bulk_add_with_profile service validates profile ownership and type compatibility
- Duplicate IP check prevents adding same IP twice in one tenant
- TCP reachability check for RouterOS devices, skipped for SNMP (UDP)
- Per-device result reporting with partial success support
- Device model updated with device_type, snmp_port, snmp_version, snmp_profile_id columns
- Audit logging for bulk add operations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:59:24 -05:00
Jason Staack
7354708df2 feat(17-01): add credential profile service, router, device assignment
- Service with CRUD + Transit encryption for all new credential writes
- Router with 6 endpoints under /tenants/{tenant_id}/credential-profiles
- Delete returns HTTP 409 with device_count when devices reference profile
- Registered credential_profiles_router in main.py
- DeviceUpdate schema accepts optional credential_profile_id
- update_device validates profile belongs to tenant before assigning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:54:02 -05:00
Jason Staack
390df0531d feat(17-02): add snmp_custom handler and NAK safety net to metrics subscriber
- Add _insert_snmp_custom_metrics handler for custom SNMP OID events
- Insert all 9 columns into snmp_metrics hypertable
- Change unknown metric types from ACK to NAK for redelivery safety
- Prevents permanent data loss during deployment ordering mismatches

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:51:02 -05:00
Jason Staack
222b7c2b25 fix(sse): use ordered consumers to prevent stale consumer accumulation
SSE connections previously created regular push consumers without durable
names. When browsers disconnected uncleanly or the API restarted, these
orphaned consumers persisted on the NATS server and continued draining
messages — each restart added more, eventually saturating the API at
100% CPU.

Switched to ordered_consumer=True which:
- Creates ephemeral consumers with no server-side ack state
- Auto-cleans on disconnect (no orphans)
- Still delivers new messages in real-time for SSE

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 18:11:49 -05:00
Jason Staack
6a5829e0ff style: ruff format 10 python files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:49:59 -05:00
Jason Staack
9d6b68760f fix(lint): remove unused imports and extraneous f-string prefix
Ruff auto-fix: unused Optional imports in sectors router and link
schemas, unused Site import in device service, unused datetime
imports in trend detector, unused text import in site service,
and f-string without placeholders in signal history service.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:45:47 -05:00
Jason Staack
8eb8c0a8fa fix(15): correct SQL column names in trend detector and alert evaluator
- Replace `collected_at` with `time` (actual hypertable column) in 5 queries
- Remove non-existent `rule_type` column from site_alert_events INSERTs
- Fix trend dedup query to use `rule_id IS NULL` instead of `rule_type`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 07:33:05 -05:00
Jason Staack
124a72582b feat(15-01): add signal history and site alert services, routers, and main.py wiring
- Create signal_history_service with TimescaleDB time_bucket queries for 24h/7d/30d ranges
- Create site_alert_service with full CRUD for rules, events list/resolve, and active count
- Create signal_history router with GET endpoint for time-bucketed signal data
- Create site_alerts router with CRUD endpoints for rules and event management
- Wire both routers into main.py with /api prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 07:18:02 -05:00
Jason Staack
c3ae48eb0c feat(15-02): add trend detection and alert evaluation scheduled tasks
- Create trend_detector.py: hourly 7d vs 14d signal comparison per active link
- Create alert_evaluator_site.py: 5-min evaluation of 4 rule types with hysteresis
- Wire both tasks into lifespan with non-fatal startup and cancel on shutdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 07:16:06 -05:00
Jason Staack
430cab98a8 feat(14-01): add site_id device filter, wireless data endpoints, and frontend API clients
- Add site_id and sector_id query parameters to devices list endpoint
- Add get_device_registrations and get_device_rf_stats to link_service
- Add RegistrationResponse, RFStatsResponse schemas to link.py
- Add /registrations and /rf-stats endpoints to links router
- Add sectorsApi frontend client (list, create, update, delete, assignDevice)
- Add wirelessApi frontend client (links, registrations, RF stats, unknown clients)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:42:08 -05:00
Jason Staack
ea5afe3408 feat(14-01): add sector CRUD backend with migration, model, service, and router
- Create sectors table migration (034) with RLS and devices.sector_id FK
- Add Sector ORM model with site_id and tenant_id foreign keys
- Add SectorCreate/Update/Response/ListResponse Pydantic schemas
- Implement sector_service with CRUD and device assignment functions
- Add sectors router with GET/POST/PUT/DELETE and device sector assignment
- Register sectors router in main.py
- Add sector_id and sector_name to Device model and DeviceResponse

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:40:44 -05:00
Jason Staack
0434d31030 feat(13-03): add link service, schemas, router, and wire subscribers into lifespan
- LinkResponse/UnknownClientResponse Pydantic schemas with from_attributes
- Link service with get_links, get_device_links, get_site_links, get_unknown_clients
- Unknown clients query uses DISTINCT ON for latest registration per MAC
- 4 REST endpoints: tenant links, device links, site links, unknown clients
- Interface and link discovery subscribers wired into FastAPI lifespan start/stop
- Links router registered at /api prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:12:06 -05:00
Jason Staack
3209a7d9be feat(13-03): add interface and link discovery NATS subscribers
- Interface subscriber consumes device.interfaces.> from DEVICE_EVENTS, upserts device_interfaces table
- Link discovery subscriber consumes wireless.registrations.> with separate durable consumer
- MAC resolution against device_interfaces for AP-CPE link discovery
- State machine: active (signal >= -80dBm), degraded (< -80), down (3 missed), stale (24h)
- missed_polls resets to 0 on any observation, enabling link revival

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:10:17 -05:00
Jason Staack
390c4c1297 feat(12-02): add NATS subscriber for wireless registrations and wire into lifespan
- wireless_registration_subscriber.py: consumes wireless.registrations.> from WIRELESS_REGISTRATIONS stream
- Inserts per-client rows into wireless_registrations hypertable
- Inserts RF monitor data into rf_monitor_stats hypertable
- Uses AdminAsyncSessionLocal to bypass RLS for cross-tenant writes
- Durable consumer: api-wireless-reg-consumer with retry logic
- Wired into FastAPI lifespan with non-fatal startup and graceful shutdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 05:37:12 -05:00
Jason Staack
ddb2b3e43a feat(11-03): add site_id and site_name to DeviceResponse
- Add site_id (Optional[UUID]) and site_name (Optional[str]) to backend DeviceResponse schema
- Include site fields in _build_device_response helper
- Add selectinload(Device.site) to _device_with_relations for eager loading
- Add site_id and site_name to frontend DeviceResponse interface

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:50:57 -05:00
Jason Staack
7afd918e2f feat(11-01): create site service, router, and wire into app
- Add site_service with CRUD, health rollup, device assignment functions
- Add sites router with 8 endpoints (CRUD + assign/unassign/bulk-assign)
- RBAC: viewer for reads, operator for writes, tenant_admin for delete
- Wire sites_router into main.py with /api prefix
- Health rollup computes device_count, online_count, online_percent per site

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:38:54 -05:00
Jason Staack
091c19c434 fix: remove unreachable kms_service import in notification_service
kms_service.py does not exist and Transit encryption was never
implemented for SMTP passwords, making the decrypt_transit code path
unreachable. Remove it entirely and leave only the Fernet fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:15:39 -05:00
Jason Staack
14ff8a54ca fix: add logging to silent error handlers, check maintenance windows for online events
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:09:30 -05:00
Jason Staack
06a41ca9bf fix(lint): resolve all ruff lint errors
Add ruff config to exclude alembic E402, SQLAlchemy F821, and pre-existing
E501 line-length issues. Auto-fix 69 unused imports and 2 f-strings without
placeholders. Manually fix 8 unused variables. Apply ruff format to 127 files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:17:50 -05:00
Jason Staack
2ad0367c91 fix(vpn): backport VPN fixes from production debugging
- Fix _commit_and_sync infinite recursion
- Use admin session for subnet_index allocation (bypass RLS)
- Auto-set VPN endpoint from CORS_ORIGINS hostname
- Remove server address field from VPN setup UI
- Add DELETE endpoint and button for VPN config removal
- Add wg-reload watcher for reliable config hot-reload via wg syncconf
- Add wg_status.json writer for live peer handshake status in UI
- Per-tenant SNAT for poller-to-device routing through VPN
- Restrict VPN→eth0 forwarding to Docker networks only (block exit node abuse)
- Use 10.10.0.0/16 allowed-address in RouterOS commands
- Fix structlog event= conflict (use audit=True)
- Export backup_scheduler proxy for firmware/upgrade imports
2026-03-14 20:59:14 -05:00
Jason Staack
b5f9bf14df fix(vpn): commit before sync_wireguard_config to ensure data visibility
sync_wireguard_config opens its own AdminAsyncSessionLocal connection
which cannot see uncommitted data from the caller's transaction. Add
_commit_and_sync helper that commits first, then regenerates wg0.conf.

Also removes the unused db parameter from sync_wireguard_config.
2026-03-14 16:42:17 -05:00
Jason Staack
5e70890d76 feat(vpn): refactor setup_vpn and sync_wireguard_config for multi-tenant isolation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:30:13 -05:00
Jason Staack
93fe935edf feat(vpn): add global server key helpers, subnet allocation, and allowed-IPs validation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:27:35 -05:00
Jason Staack
970501e453 feat: implement Remote WinBox worker, API, frontend integration, OpenBao persistence, and supporting docs 2026-03-14 09:05:14 -05:00
Jason Staack
1a1ceb2cb1 feat(10-01): add audit event logging to config backup operations
- config_snapshot_created event after successful snapshot INSERT
- config_snapshot_skipped_duplicate event on dedup match
- config_diff_generated event after diff INSERT
- config_backup_manual_trigger event on manual trigger success
- All log_action calls wrapped in try/except for safety

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:44:00 -05:00
Jason Staack
a9f7a45a9b feat(09-01): implement retention cleanup service with configurable retention period
- Add CONFIG_RETENTION_DAYS setting (default 90) to config.py
- Create retention_service.py with cleanup_expired_snapshots (parameterized SQL via make_interval)
- APScheduler IntervalTrigger runs cleanup every 24h with 1h jitter
- Prometheus counter and histogram for observability
- CASCADE FKs handle diff/change deletion automatically
- All 4 unit tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:33:27 -05:00
Jason Staack
83cd661efc feat(06-02): add get_snapshot and get_snapshot_diff service functions
- get_snapshot queries snapshot by id/device/tenant, decrypts via Transit
- get_snapshot_diff queries diff by new_snapshot_id with device/tenant filter
- Both return None for missing data (404-safe)
- 4 new tests with mocked Transit and DB sessions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:02:58 -05:00
Jason Staack
f7d5aec4ec feat(06-01): add config history service with TDD tests
- Service queries router_config_changes JOIN router_config_diffs for timeline
- Returns paginated entries with component, summary, timestamp, diff metadata
- ORDER BY created_at DESC with limit/offset pagination
- 4 tests covering formatting, empty results, pagination, and ordering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:58:51 -05:00
Jason Staack
122b5917f4 feat(05-02): wire change parser into diff service with RETURNING id
- Diff INSERT now uses RETURNING id to capture diff_id
- parse_diff_changes called after diff commit, results stored in router_config_changes
- Change parser errors are best-effort (logged, never block diff storage)
- Added tests for change storage and parser error resilience

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:37:09 -05:00
Jason Staack
b167831105 feat(05-02): implement config change parser for RouterOS diffs
- parse_diff_changes() extracts component, summary, raw_line from unified diffs
- RouterOS path detection converts /ip firewall filter to ip/firewall/filter
- Human-readable summaries: Added/Removed/Modified N component rules
- Fallback to system/general when no path headers found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:35:48 -05:00
Jason Staack
eb76343d04 feat(05-01): wire diff generation into snapshot subscriber
- Add RETURNING id to snapshot INSERT for new_snapshot_id capture
- Call generate_and_store_diff after successful commit (best-effort)
- Outer try/except safety net ensures snapshot ack never blocked by diff
- Update subscriber tests to mock diff service

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:32:40 -05:00
Jason Staack
72d0ae2856 feat(05-01): implement config diff service with Transit decrypt and difflib
- generate_and_store_diff decrypts old+new snapshots, produces unified diff
- Stores diff in router_config_diffs with line counts
- Best-effort: decrypt/DB errors logged, never raised
- Prometheus metrics: generated_total, errors_total, duration_seconds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:31:28 -05:00
Jason Staack
3ab9f27d49 feat(03-01): implement config snapshot subscriber with dedup and encryption
- NATS subscriber for config.snapshot.> on DEVICE_EVENTS stream
- Dedup by SHA256 hash against latest snapshot per device
- OpenBao Transit encryption before INSERT (plaintext never stored)
- Malformed/orphan messages acked and discarded safely
- Transit failure causes nak for NATS retry
- Prometheus metrics: ingested, dedup_skipped, errors, duration
- All 6 unit tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 21:47:07 -05:00
Jason Staack
acf1790bed feat: add audit.session.end NATS pipeline for SSH session tracking
Poller publishes session end events via JetStream when SSH sessions
close (normal disconnect or idle timeout). Backend subscribes with a
durable consumer and writes ssh_session_end audit log entries with
duration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 16:07:10 -05:00
Cog
6b22741f54 fix: audit logs never persisted + firmware-cache permission denied
Two bugs fixed:

1. audit_service.py: log_action() inserted into audit_logs using the
   caller's DB session but never committed. Any router that called
   db.commit() before log_action() (firmware, devices, config_editor,
   alerts, certificates) had its audit rows silently rolled back when
   the request session closed.
   Fix: log_action now opens its own AdminAsyncSessionLocal and self-
   commits, making audit persistence independent of the caller's
   transaction. The 'db' parameter is kept for backward compat but
   unused. Affects 5 routers (firmware, devices, config_editor,
   alerts, certificates).

2. docker-compose.override.yml: /data/firmware-cache had no volume
   mount so the directory didn't exist in the container, causing
   firmware downloads to fail with Permission denied.
   Fix: bind-mount docker-data/firmware-cache:/data/firmware-cache
   so firmware images survive container restarts.
2026-03-12 14:05:40 -05:00
Jason Staack
837ab6f8fa fix(backend): parse CLI command string into RouterOS API command + args
execute_cli was passing the full CLI string (e.g. '/ping address=8.8.8.8
count=4') as a single command to the Go poller. go-routeros expects the
command path and args separately. Now splits into command + prefixed args.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:05:05 -05:00
Jason Staack
f7a53e60da fix: SMTP TLS logic was inverted — plain SMTP incorrectly used STARTTLS
When use_tls=false, the old logic set start_tls=true for any port != 25,
which broke plain SMTP servers like Mailpit. Now:
- Port 465: implicit TLS
- use_tls=true on other ports: STARTTLS
- use_tls=false: plain SMTP (no TLS)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:03:54 -05:00
Jason Staack
b840047e19 feat: The Other Dude v9.0.1 — full-featured email system
ci: add GitHub Pages deployment workflow for docs site

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 19:30:44 -05:00