perf: fix API CPU saturation at 400+ devices

Root cause: stale NATS JetStream consumers accumulated across API
restarts, causing 13+ consumers to fight over messages in a single
Python async event loop (100% CPU).

Fixes:
- Add performance indexes on devices(tenant_id, hostname),
  devices(tenant_id, status), key_access_log(tenant_id, created_at)
  — drops devices seq_scans from 402k to 6 per interval
- Remove redundant ORDER BY t.name from fleet summary SQL
  (tenant name sort is client-side, was forcing a cross-table sort)
- Bump NATS memory limit from 128MB to 256MB (was at 118/128)
- Increase dev poll interval from 60s to 120s for 400+ device fleet

The stream purge + restart brought API CPU from 100% to 0.3%.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This commit is contained in:

Jason Staack

2026-03-19 18:06:40 -05:00

parent 413376e363

commit 1042319a08

4 changed files with 54 additions and 3 deletions

									
										2

docker-compose.yml
									
												View File
												
				@@ -73,7 +73,7 @@ services:

				    deploy:

				      resources:

				        limits:

				          memory: 128M

				          memory: 256M

				    networks:

				      - tod

perf: fix API CPU saturation at 400+ devices

2 docker-compose.yml Unescape Escape View File

2

docker-compose.yml

View File