perf: fix API CPU saturation at 400+ devices

Root cause: stale NATS JetStream consumers accumulated across API restarts, causing 13+ consumers to fight over messages in a single Python async event loop (100% CPU). Fixes: - Add performance indexes on devices(tenant_id, hostname), devices(tenant_id, status), key_access_log(tenant_id, created_at) — drops devices seq_scans from 402k to 6 per interval - Remove redundant ORDER BY t.name from fleet summary SQL (tenant name sort is client-side, was forcing a cross-table sort) - Bump NATS memory limit from 128MB to 256MB (was at 118/128) - Increase dev poll interval from 60s to 120s for 400+ device fleet The stream purge + restart brought API CPU from 100% to 0.3%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 18:06:40 -05:00
parent 413376e363
commit 1042319a08
4 changed files with 54 additions and 3 deletions
--- a/docker-compose.override.yml
+++ b/docker-compose.override.yml
@@ -80,7 +80,7 @@ services:
      CREDENTIAL_ENCRYPTION_KEY: ${CREDENTIAL_ENCRYPTION_KEY:?Set CREDENTIAL_ENCRYPTION_KEY in .env}
      OPENBAO_ADDR: http://openbao:8200
      OPENBAO_TOKEN: dev-openbao-token
-      POLL_INTERVAL_SECONDS: 60
+      POLL_INTERVAL_SECONDS: 120
      WIREGUARD_GATEWAY: wireguard
      TUNNEL_PORT_MIN: 49000
      TUNNEL_PORT_MAX: 49004