22 Commits

Author SHA1 Message Date
Jason Staack
231154d28b fix(lint): format SNMP and credential profile files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 18:42:28 -05:00
Jason Staack
b16a60dc7a fix(lint): remove unused textwrap import in migration 038
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 08:53:22 -05:00
Jason Staack
1888f850af fix(migration): grant app_user permissions on SNMP tables
The app_user role had no INSERT/UPDATE/DELETE on credential_profiles,
snmp_profiles, or snmp_metrics — causing 'permission denied' when
creating credential profiles or SNMP profiles from the UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 07:59:35 -05:00
Jason Staack
2fff669cc3 fix(migration): use CAST instead of :: for jsonb type cast in seed data
SQLAlchemy's text() interprets ::jsonb as a named parameter binding.
Use CAST(:profile_data AS jsonb) to avoid the collision.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 23:27:52 -05:00
Jason Staack
37ed0242f4 feat(16-01): add devices SNMP columns and snmp_metrics hypertable
- devices table: device_type (default 'routeros'), snmp_port (default 161),
  snmp_version, snmp_profile_id FK -> snmp_profiles, credential_profile_id
  FK -> credential_profiles, with lock_timeout = 3s for safe ALTER
- snmp_metrics: hypertable with 90-day retention, composite index on
  (device_id, metric_name, time DESC), RLS with tenant isolation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:22:27 -05:00
Jason Staack
ad26335300 feat(16-01): add credential_profiles and snmp_profiles tables
- credential_profiles: UUID PK, tenant_id FK with CASCADE, credential_type,
  encrypted credential fields, unique(tenant_id, name), RLS, poller_user GRANT
- snmp_profiles: UUID PK, nullable tenant_id for system profiles, profile_data
  JSONB, partial unique indexes for tenant vs system name uniqueness, RLS with
  system profile visibility to all tenants, poller_user GRANT
- 6 system seed profiles: generic-snmp, network-switch, network-router,
  wireless-ap, ups-device, mikrotik-snmp with full OID collection definitions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:21:22 -05:00
Jason Staack
1042319a08 perf: fix API CPU saturation at 400+ devices
Root cause: stale NATS JetStream consumers accumulated across API
restarts, causing 13+ consumers to fight over messages in a single
Python async event loop (100% CPU).

Fixes:
- Add performance indexes on devices(tenant_id, hostname),
  devices(tenant_id, status), key_access_log(tenant_id, created_at)
  — drops devices seq_scans from 402k to 6 per interval
- Remove redundant ORDER BY t.name from fleet summary SQL
  (tenant name sort is client-side, was forcing a cross-table sort)
- Bump NATS memory limit from 128MB to 256MB (was at 118/128)
- Increase dev poll interval from 60s to 120s for 400+ device fleet

The stream purge + restart brought API CPU from 100% to 0.3%.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 18:06:40 -05:00
Jason Staack
413376e363 fix(db): add missing GRANT statements to v9.7 migrations
Migrations 030 (sites), 032 (device_interfaces), 033 (wireless_links),
and 034 (sectors) were missing GRANT statements for app_user and
poller_user. Without these, fresh deploys crash on site/sector CRUD
with permission denied errors. Also added poller_user SELECT grants
to migration 035 (site_alert_rules/events).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 17:46:23 -05:00
Jason Staack
6a5829e0ff style: ruff format 10 python files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:49:59 -05:00
Jason Staack
d4cf36b200 feat(15-01): add site alert rules/events migration, models, schemas, and config
- Create Alembic migration 035 with site_alert_rules and site_alert_events tables, RLS policies, and GRANT
- Add SiteAlertRule/SiteAlertEvent ORM models with enums for rule_type, severity, state
- Add Pydantic schemas for rule/event CRUD and signal history points
- Add SIGNAL_DEGRADATION_THRESHOLD_DB, ALERT_EVALUATION_INTERVAL_SECONDS, TREND_DETECTION_INTERVAL_SECONDS to Settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 07:16:05 -05:00
Jason Staack
ea5afe3408 feat(14-01): add sector CRUD backend with migration, model, service, and router
- Create sectors table migration (034) with RLS and devices.sector_id FK
- Add Sector ORM model with site_id and tenant_id foreign keys
- Add SectorCreate/Update/Response/ListResponse Pydantic schemas
- Implement sector_service with CRUD and device assignment functions
- Add sectors router with GET/POST/PUT/DELETE and device sector assignment
- Register sectors router in main.py
- Add sector_id and sector_name to Device model and DeviceResponse

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:40:44 -05:00
Jason Staack
a71df2af29 feat(13-02): add wireless_links table migration, ORM model, register both models
- Migration 033 creates wireless_links with state machine, missed_polls, RLS
- WirelessLink model with LinkState enum (discovered/active/degraded/down/stale)
- Register DeviceInterface, WirelessLink, LinkState in models __init__

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:02:14 -05:00
Jason Staack
7147b15e13 feat(13-02): add device_interfaces table migration and ORM model
- Migration 032 creates device_interfaces with RLS, MAC index, unique(device_id, name)
- DeviceInterface SQLAlchemy model with all columns and device relationship

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:01:22 -05:00
Jason Staack
d12e9e280b feat(12-02): create wireless_registrations and rf_monitor_stats hypertables
- wireless_registrations hypertable with per-client columns (mac, signal, rates, uptime)
- rf_monitor_stats hypertable for RF environment data (noise floor, channel width, tx power)
- RLS tenant_isolation with super_admin bypass on both tables
- Composite indexes: device+time, mac+time (for Phase 13 link discovery)
- 30-day retention policies on both hypertables
- GRANTs for app_user and poller_user

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 05:35:56 -05:00
Jason Staack
f7e678532c feat(11-01): create sites table migration, model, and schemas
- Add migration 030 with sites table, RLS policy, and device site_id FK
- Add Site SQLAlchemy model with tenant isolation
- Add site_id nullable FK and relationship to Device model
- Add sites relationship to Tenant model
- Register Site in models __init__.py
- Add SiteCreate, SiteUpdate, SiteResponse, SiteListResponse schemas

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:37:08 -05:00
Jason Staack
fe23459369 fix(ci): fix hardcoded DB name in migration and Go version compat
- migration 002: use current_database() instead of hardcoded 'tod'
- ci.yml: use Go 1.25 (required by nats-server dep), mark golangci-lint
  as continue-on-error until it supports Go 1.25
- go.mod: keep at 1.25.0 (nats-server v2.12.5 requires it)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 23:03:20 -05:00
Jason Staack
06a41ca9bf fix(lint): resolve all ruff lint errors
Add ruff config to exclude alembic E402, SQLAlchemy F821, and pre-existing
E501 line-length issues. Auto-fix 69 unused imports and 2 f-strings without
placeholders. Manually fix 8 unused variables. Apply ruff format to 127 files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:17:50 -05:00
Jason Staack
593323d277 feat(vpn): add subnet_index column and global server keypair migration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 16:25:09 -05:00
Jason Staack
9b060c5fdf refactor: rename database from mikrotik to tod in backend code 2026-03-14 10:57:20 -05:00
Jason Staack
4ae39d2cb3 feat(02-01): add config backup env vars, NATS event, device SSH fields, migration, metrics
- Config: CONFIG_BACKUP_INTERVAL (21600s), CONFIG_BACKUP_MAX_CONCURRENT (10), CONFIG_BACKUP_COMMAND_TIMEOUT (60s)
- NATS: ConfigSnapshotEvent type, PublishConfigSnapshot method, config.snapshot.> stream subject
- Device: SSHPort/SSHHostKeyFingerprint fields, UpdateSSHHostKey method, updated queries/scans
- Migration 028: ssh_port, ssh_host_key_fingerprint, timestamp columns with poller_user grants
- Metrics: ConfigBackupTotal (counter), ConfigBackupDuration (histogram), ConfigBackupActive (gauge)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:48:12 -05:00
Jason Staack
a7a17a5ecd feat(01-01): add Alembic migration 027 for config snapshot tables with RLS
- Create router_config_snapshots table with Transit ciphertext storage
- Create router_config_diffs table with snapshot pair FK references
- Create router_config_changes table for parsed semantic changes
- Add RLS tenant isolation (ENABLE + FORCE + USING + WITH CHECK) on all 3
- Add GRANT SELECT/INSERT/DELETE to app_user on all 3
- Add performance indexes: device+collected_at, device+hash, snapshot pair, diff_id

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:04:18 -05:00
Jason Staack
b840047e19 feat: The Other Dude v9.0.1 — full-featured email system
ci: add GitHub Pages deployment workflow for docs site

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 19:30:44 -05:00