The app_user role had no INSERT/UPDATE/DELETE on credential_profiles,
snmp_profiles, or snmp_metrics — causing 'permission denied' when
creating credential profiles or SNMP profiles from the UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SQLAlchemy's text() interprets ::jsonb as a named parameter binding.
Use CAST(:profile_data AS jsonb) to avoid the collision.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- devices table: device_type (default 'routeros'), snmp_port (default 161),
snmp_version, snmp_profile_id FK -> snmp_profiles, credential_profile_id
FK -> credential_profiles, with lock_timeout = 3s for safe ALTER
- snmp_metrics: hypertable with 90-day retention, composite index on
(device_id, metric_name, time DESC), RLS with tenant isolation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- credential_profiles: UUID PK, tenant_id FK with CASCADE, credential_type,
encrypted credential fields, unique(tenant_id, name), RLS, poller_user GRANT
- snmp_profiles: UUID PK, nullable tenant_id for system profiles, profile_data
JSONB, partial unique indexes for tenant vs system name uniqueness, RLS with
system profile visibility to all tenants, poller_user GRANT
- 6 system seed profiles: generic-snmp, network-switch, network-router,
wireless-ap, ups-device, mikrotik-snmp with full OID collection definitions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: stale NATS JetStream consumers accumulated across API
restarts, causing 13+ consumers to fight over messages in a single
Python async event loop (100% CPU).
Fixes:
- Add performance indexes on devices(tenant_id, hostname),
devices(tenant_id, status), key_access_log(tenant_id, created_at)
— drops devices seq_scans from 402k to 6 per interval
- Remove redundant ORDER BY t.name from fleet summary SQL
(tenant name sort is client-side, was forcing a cross-table sort)
- Bump NATS memory limit from 128MB to 256MB (was at 118/128)
- Increase dev poll interval from 60s to 120s for 400+ device fleet
The stream purge + restart brought API CPU from 100% to 0.3%.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrations 030 (sites), 032 (device_interfaces), 033 (wireless_links),
and 034 (sectors) were missing GRANT statements for app_user and
poller_user. Without these, fresh deploys crash on site/sector CRUD
with permission denied errors. Also added poller_user SELECT grants
to migration 035 (site_alert_rules/events).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create Alembic migration 035 with site_alert_rules and site_alert_events tables, RLS policies, and GRANT
- Add SiteAlertRule/SiteAlertEvent ORM models with enums for rule_type, severity, state
- Add Pydantic schemas for rule/event CRUD and signal history points
- Add SIGNAL_DEGRADATION_THRESHOLD_DB, ALERT_EVALUATION_INTERVAL_SECONDS, TREND_DETECTION_INTERVAL_SECONDS to Settings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create sectors table migration (034) with RLS and devices.sector_id FK
- Add Sector ORM model with site_id and tenant_id foreign keys
- Add SectorCreate/Update/Response/ListResponse Pydantic schemas
- Implement sector_service with CRUD and device assignment functions
- Add sectors router with GET/POST/PUT/DELETE and device sector assignment
- Register sectors router in main.py
- Add sector_id and sector_name to Device model and DeviceResponse
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Migration 033 creates wireless_links with state machine, missed_polls, RLS
- WirelessLink model with LinkState enum (discovered/active/degraded/down/stale)
- Register DeviceInterface, WirelessLink, LinkState in models __init__
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Migration 032 creates device_interfaces with RLS, MAC index, unique(device_id, name)
- DeviceInterface SQLAlchemy model with all columns and device relationship
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- wireless_registrations hypertable with per-client columns (mac, signal, rates, uptime)
- rf_monitor_stats hypertable for RF environment data (noise floor, channel width, tx power)
- RLS tenant_isolation with super_admin bypass on both tables
- Composite indexes: device+time, mac+time (for Phase 13 link discovery)
- 30-day retention policies on both hypertables
- GRANTs for app_user and poller_user
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add migration 030 with sites table, RLS policy, and device site_id FK
- Add Site SQLAlchemy model with tenant isolation
- Add site_id nullable FK and relationship to Device model
- Add sites relationship to Tenant model
- Register Site in models __init__.py
- Add SiteCreate, SiteUpdate, SiteResponse, SiteListResponse schemas
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- migration 002: use current_database() instead of hardcoded 'tod'
- ci.yml: use Go 1.25 (required by nats-server dep), mark golangci-lint
as continue-on-error until it supports Go 1.25
- go.mod: keep at 1.25.0 (nats-server v2.12.5 requires it)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create router_config_snapshots table with Transit ciphertext storage
- Create router_config_diffs table with snapshot pair FK references
- Create router_config_changes table for parsed semantic changes
- Add RLS tenant isolation (ENABLE + FORCE + USING + WITH CHECK) on all 3
- Add GRANT SELECT/INSERT/DELETE to app_user on all 3
- Add performance indexes: device+collected_at, device+hash, snapshot pair, diff_id
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>