Commit Graph

10 Commits

Author SHA1 Message Date
Jason Staack
21f2934906 fix(map): revert to Leaflet + proxied OSM tiles, add CPE signal to popups
Reverted from MapLibre/PMTiles to Leaflet with nginx-proxied OSM raster
tiles — the MapLibre approach had unresolvable CSP and theme compat
issues. The proxy keeps all browser requests local (no third-party).

Also:
- Add CPE signal strength and parent AP name to fleet summary SQL
  and map popup cards (e.g. "Signal: -62 dBm to ap-shady-north")
- Add .dockerignore to exclude 8GB PMTiles and node_modules from
  Docker build context (was causing 10+ minute builds)
- Configure mailpit SMTP in dev compose

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 21:47:15 -05:00
Jason Staack
f0ddd98b93 feat(map): self-hosted PMTiles map tiles, remove alert toast spam
- Replace OpenStreetMap CDN with self-hosted Protomaps PMTiles
  (Wisconsin + Florida regional extracts, served from nginx)
- Add protomaps-leaflet for vector tile rendering in dark theme
- Update CSP to remove openstreetmap.org, add blob: for vector workers
- Add nginx location block for /tiles/ with byte range support
- Mount tiles directory as volume (not baked into image)
- Remove alert_fired/alert_resolved toast notifications that spammed
  "undefined" at fleet scale — dashboard still updates via query invalidation
- Add *.pmtiles to .gitignore (large binaries)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 18:30:08 -05:00
Jason Staack
1042319a08 perf: fix API CPU saturation at 400+ devices
Root cause: stale NATS JetStream consumers accumulated across API
restarts, causing 13+ consumers to fight over messages in a single
Python async event loop (100% CPU).

Fixes:
- Add performance indexes on devices(tenant_id, hostname),
  devices(tenant_id, status), key_access_log(tenant_id, created_at)
  — drops devices seq_scans from 402k to 6 per interval
- Remove redundant ORDER BY t.name from fleet summary SQL
  (tenant name sort is client-side, was forcing a cross-table sort)
- Bump NATS memory limit from 128MB to 256MB (was at 118/128)
- Increase dev poll interval from 60s to 120s for 400+ device fleet

The stream purge + restart brought API CPU from 100% to 0.3%.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 18:06:40 -05:00
Jason Staack
9123c6e6c0 refactor: rename database from mikrotik to tod in dev override 2026-03-14 09:59:25 -05:00
Jason Staack
970501e453 feat: implement Remote WinBox worker, API, frontend integration, OpenBao persistence, and supporting docs 2026-03-14 09:05:14 -05:00
Jason Staack
c2eea6847f fix: WinBox tunnel bind address, port range, and proxy support
- Bind tunnel listeners to 0.0.0.0 instead of 127.0.0.1 so tunnels
  are reachable through reverse proxies and container networks
- Reduce port range to 49000-49004 (5 concurrent tunnels)
- Derive WinBox URI host from request Host header instead of
  hardcoding 127.0.0.1, enabling use behind reverse proxies
- Add README security warning about default encryption keys

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 19:03:53 -05:00
Jason Staack
27f4403856 feat(infra): add nginx WebSocket proxy and SSH relay config to compose files
- Add WebSocket upgrade map to nginx and proxy /ws/ssh to poller:8080
- Update CSP connect-src to allow ws: and wss: for terminal connections
- Add tunnel port range 49000-49100, SSH relay env vars, ulimits, and healthcheck to poller in both override and prod compose files
- Increase poller memory limit to 512M in prod for tunnel/SSH overhead

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:40:53 -05:00
Cog
6b22741f54 fix: audit logs never persisted + firmware-cache permission denied
Two bugs fixed:

1. audit_service.py: log_action() inserted into audit_logs using the
   caller's DB session but never committed. Any router that called
   db.commit() before log_action() (firmware, devices, config_editor,
   alerts, certificates) had its audit rows silently rolled back when
   the request session closed.
   Fix: log_action now opens its own AdminAsyncSessionLocal and self-
   commits, making audit persistence independent of the caller's
   transaction. The 'db' parameter is kept for backward compat but
   unused. Affects 5 routers (firmware, devices, config_editor,
   alerts, certificates).

2. docker-compose.override.yml: /data/firmware-cache had no volume
   mount so the directory didn't exist in the container, causing
   firmware downloads to fail with Permission denied.
   Fix: bind-mount docker-data/firmware-cache:/data/firmware-cache
   so firmware images survive container restarts.
2026-03-12 14:05:40 -05:00
Cog
58597ad4fd fix: CRLF/BOM line endings + restart policies + gitattributes
- poller/docker-entrypoint.sh: convert from CRLF+BOM to LF (UTF-8 no BOM)
  Windows saved the file with a UTF-8 BOM which made the Linux kernel
  reject the shebang with 'exec format error', crashing the poller.

- infrastructure/openbao/init.sh: same CRLF -> LF fix

- poller/Dockerfile: add sed to strip CRLF and BOM at image build time
  as a defensive measure for future Windows edits

- docker-compose.override.yml: add 'restart: on-failure' to api and poller
  so they recover from the postgres startup race (TimescaleDB restarts
  postgres after initdb, briefly causing connection refused on first boot)

- .gitattributes: enforce LF for all text/script/code files so git
  normalises line endings on checkout and prevents this class of bug
2026-03-12 14:05:40 -05:00
Jason Staack
b840047e19 feat: The Other Dude v9.0.1 — full-featured email system
ci: add GitHub Pages deployment workflow for docs site

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 19:30:44 -05:00