docs: map existing codebase

This commit is contained in:
Jason Staack
2026-03-12 19:33:26 -05:00
parent 5beede9502
commit 70126980a4
7 changed files with 2252 additions and 0 deletions

View File

@@ -0,0 +1,245 @@
# External Integrations
**Analysis Date:** 2026-03-12
## APIs & External Services
**MikroTik RouterOS:**
- Binary API (TLS port 8729) - Device polling and command execution
- SDK/Client: go-routeros/v3 (Go poller)
- Protocol: Binary encoded commands, TLS mutual authentication
- Used in: `poller/cmd/poller/main.go`, `poller/internal/poller/`
**SMTP (Transactional Email):**
- System email service (password reset, alerts, notifications)
- SDK/Client: aiosmtplib (async SMTP library)
- Configuration: `SMTP_HOST`, `SMTP_PORT`, `SMTP_USER`, `SMTP_PASSWORD`, `SMTP_USE_TLS`
- From address: `SMTP_FROM_ADDRESS`
- Implementation: `app/services/email_service.py`
- Supports TLS, STARTTLS, plain auth
**WebSocket/SSH Tunneling:**
- Browser-based SSH terminal for remote device access
- SDK/Client: asyncssh (Python), xterm.js (frontend)
- Protocol: SSH protocol with port forwarding
- Implementation: `app/routers/remote_access.py`, `poller/internal/sshrelay/`
- Features: Session auditing, command logging to NATS
## Data Storage
**Databases:**
- PostgreSQL 17 (TimescaleDB extension in production)
- Async driver: asyncpg 0.30.0+ (Python backend)
- Sync driver: pgx/v5 (Go poller)
- ORM: SQLAlchemy 2.0+ async
- Migrations: Alembic 1.14.0+
- RLS: Row-Level Security policies for multi-tenant isolation
- Models: `app/models/` (17+ model files)
- Connection: `DATABASE_URL`, `APP_USER_DATABASE_URL`, `POLLER_DATABASE_URL`
- Admin role: postgres (migrations only)
- App role: app_user (enforces RLS)
- Poller role: poller_user (direct access, no RLS)
**File Storage:**
- Local filesystem only - No cloud storage integration
- Git store (bare repos): `/data/git-store` or `./git-store` (RWX PVC in production)
- Implementation: `app/services/git_store.py`
- Purpose: Version control for device configurations (one repo per tenant)
- Firmware cache: `/data/firmware-cache`
- Purpose: Downloaded RouterOS firmware images
- Service: `app/services/firmware_service.py`
- WireGuard config: `/data/wireguard`
- Purpose: VPN peer and configuration management
**Caching:**
- Redis 7+
- Async driver: redis 5.0.0+ (Python)
- Sync driver: redis/go-redis/v9 (Go)
- Use cases:
- Session storage for SRP auth flows: `app/routers/auth.py` (key: `srp:session:{session_id}`)
- Distributed locks: poller uses `bsm/redislock` to prevent duplicate polls across replicas
- Connection: `REDIS_URL`
## Authentication & Identity
**Auth Provider:**
- Custom SRP-6a implementation (zero-knowledge auth)
- Flow: SRP-6a password hash registration → no plaintext password stored
- Implementation: `app/services/srp_service.py`, `app/routers/auth.py`
- JWT tokens: HS256 signed with `JWT_SECRET_KEY`
- Token storage: httpOnly cookies (frontend sends via credentials)
- Refresh: 15-minute access tokens, 7-day refresh tokens
- Fallback: Legacy bcrypt password support during upgrade phase
**User Roles:**
- Four role levels with RBAC:
- super_admin - Cross-tenant access, user/billing management
- admin - Full tenant management (invite users, config push, firmware)
- operator - Limited: config push, monitoring, alerts
- viewer - Read-only: dashboard, reports, audit logs
**Credential Encryption:**
- Per-tenant envelope encryption via OpenBao Transit
- Service: `app/services/openbao_service.py`
- Cipher: AES-256-GCM via OpenBao Transit engine
- Key naming: `tenant_{uuid}` (created on tenant creation)
- Fallback: Legacy Fernet decryption for credentials created before Transit migration
## Monitoring & Observability
**Error Tracking:**
- Not integrated - No Sentry, DataDog, or equivalent
- Local structured logging only
**Logs:**
- Structured logging via structlog (Python backend)
- Format: JSON (production), human-readable (dev)
- Configuration: `app/logging_config.py`
- Log level: Configurable via `LOG_LEVEL` env var
- Structured logging via slog (Go poller)
- Format: JSON with service name and instance hostname
- Configuration: `poller/cmd/poller/main.go`
**Metrics:**
- Prometheus metrics export
- Library: prometheus-fastapi-instrumentator 7.0.0+
- Setup: `app/observability.py`
- Endpoint: Exposed metrics in Prometheus text format
- Not scraped by default - requires external Prometheus instance
**OpenTelemetry:**
- Minimal OTEL instrumentation in Go poller
- SDK: `go.opentelemetry.io/otel` 1.39.0+
- Not actively used in Python backend
## CI/CD & Deployment
**Hosting:**
- Self-hosted (Docker Compose for local, Kubernetes for production)
- No cloud provider dependency
- Reverse proxy: Caddy (reference: user memory notes)
**CI Pipeline:**
- GitHub Actions (`.github/workflows/`)
- Not fully analyzed - check workflows for details
**Containers:**
- Docker multi-stage builds for all three services
- Images: `api` (FastAPI), `poller` (Go binary), `frontend` (Vite SPA)
- Profiles: `full` (all services), `mail-testing` (adds Mailpit)
## Environment Configuration
**Required env vars:**
- `DATABASE_URL` - PostgreSQL admin connection
- `SYNC_DATABASE_URL` - Alembic migrations connection
- `APP_USER_DATABASE_URL` - App-scoped RLS connection
- `POLLER_DATABASE_URL` - Poller service connection
- `REDIS_URL` - Redis connection
- `NATS_URL` - NATS JetStream connection
- `JWT_SECRET_KEY` - HS256 signing key (MUST be unique in production)
- `CREDENTIAL_ENCRYPTION_KEY` - Base64-encoded 32-byte AES key
- `OPENBAO_ADDR` - OpenBao server address
- `OPENBAO_TOKEN` - OpenBao authentication token
- `CORS_ORIGINS` - Frontend origins (comma-separated)
- `SMTP_HOST`, `SMTP_PORT` - Email server
- `FIRST_ADMIN_EMAIL`, `FIRST_ADMIN_PASSWORD` - Bootstrap account (dev only)
**Secrets location:**
- `.env` file (git-ignored) - Development
- Environment variables in production (Kubernetes secrets, docker compose .env)
- OpenBao - Stores Transit encryption keys (not key material, only key references)
**Security defaults validation:**
- `app/config.py` rejects known-insecure values in non-dev environments:
- `JWT_SECRET_KEY` hard-coded defaults
- `CREDENTIAL_ENCRYPTION_KEY` hard-coded defaults
- `OPENBAO_TOKEN` hard-coded defaults
- Fails startup with clear error message if production uses dev secrets
## Webhooks & Callbacks
**Incoming:**
- None detected - No external webhook subscriptions
**Outgoing:**
- Slack notifications - Alert firing/resolution (planned/partial implementation)
- Router: `app/routers/alerts.py`
- Implementation status: Check alert evaluation service
- Email notifications - Alert notifications, password reset
- Service: `app/services/email_service.py`
- Custom webhooks - Extensible via notification service
- Service: `app/services/notification_service.py`
## NATS JetStream Event Bus
**Message Bus:**
- NATS 2.0+ with JetStream persistence
- Python client: nats-py 2.7.0+
- Go client: nats.go 1.38.0+
- Connection: `NATS_URL`
**Event Topics (Python publisher → Go/Python subscribers):**
- `device.status.>` - Device online/offline status from Go poller
- Subscriber: `app/services/nats_subscriber.py`
- Payload: device_id, tenant_id, status, routeros_version, board_name, uptime
- Usage: Real-time device fleet updates
- `firmware.progress.{tenant_id}.{device_id}` - Firmware upgrade progress
- Subscriber: `app/services/firmware_subscriber.py`
- Publisher: Firmware upgrade service
- Payload: stage (downloading, verifying, upgrading), progress %, message
- Usage: Live firmware upgrade tracking (SSE to frontend)
- `config.push.{tenant_id}.{device_id}` - Configuration push progress
- Subscriber: `app/services/push_rollback_subscriber.py`
- Publisher: `app/services/restore_service.py`
- Payload: phase (pre-validate, backup, push, commit), status, errors
- Usage: Live config deployment tracking with rollback support
- `alert.fired.{tenant_id}`, `alert.resolved.{tenant_id}` - Alert events
- Subscriber: `app/services/sse_manager.py`
- Publisher: `app/services/alert_evaluator.py`
- Payload: alert_id, device_id, rule_name, condition, value, timestamp
- Usage: Real-time alert notifications (SSE to frontend)
- `audit.session.end` - SSH session audit events
- Subscriber: `app/services/session_audit_subscriber.py`
- Publisher: Go SSH relay (`poller/internal/sshrelay/`)
- Payload: session_id, user_id, device_id, start_time, end_time, command_log
- Usage: Session auditing and compliance logging
- `config.change.{tenant_id}.{device_id}` - Device config change detection
- Subscriber: `app/services/config_change_subscriber.py`
- Payload: device_id, change_type, affected_subsystems, timestamp
- Usage: Track unapproved config changes
- `metrics.sample.{tenant_id}.{device_id}` - Real-time CPU/memory/traffic samples
- Subscriber: `app/services/metrics_subscriber.py`
- Publisher: Go poller
- Payload: timestamp, cpu_percent, memory_percent, disk_percent, interfaces{name, rx_bytes, tx_bytes}
- Usage: Live metric streaming (SSE to frontend)
**Server-Sent Events (SSE):**
- Frontend subscribes to per-tenant SSE streams
- Endpoint: `GET /api/sse/subscribe?tenant_id={tenant_id}`
- Connection: Long-lived HTTP persistent stream
- Implementation: `app/routers/sse.py`, `app/services/sse_manager.py`
- Payload format: SSE (text/event-stream)
- Events forwarded from NATS to frontend browser in real-time
- Used for: firmware progress, alerts, config push status, metrics
## Git Integration
**Version Control:**
- Bare git repositories stored per-tenant
- Library: pygit2 1.14.0+
- Location: `{GIT_STORE_PATH}/tenant_{tenant_id}/`
- Purpose: Store device configuration history
- Commits created on: successful config push, manual save
- Restore: One-click revert to any previous commit
- Implementation: `app/services/git_store.py`
---
*Integration audit: 2026-03-12*