docs: map existing codebase
This commit is contained in:
245
.planning/codebase/INTEGRATIONS.md
Normal file
245
.planning/codebase/INTEGRATIONS.md
Normal file
@@ -0,0 +1,245 @@
|
||||
# External Integrations
|
||||
|
||||
**Analysis Date:** 2026-03-12
|
||||
|
||||
## APIs & External Services
|
||||
|
||||
**MikroTik RouterOS:**
|
||||
- Binary API (TLS port 8729) - Device polling and command execution
|
||||
- SDK/Client: go-routeros/v3 (Go poller)
|
||||
- Protocol: Binary encoded commands, TLS mutual authentication
|
||||
- Used in: `poller/cmd/poller/main.go`, `poller/internal/poller/`
|
||||
|
||||
**SMTP (Transactional Email):**
|
||||
- System email service (password reset, alerts, notifications)
|
||||
- SDK/Client: aiosmtplib (async SMTP library)
|
||||
- Configuration: `SMTP_HOST`, `SMTP_PORT`, `SMTP_USER`, `SMTP_PASSWORD`, `SMTP_USE_TLS`
|
||||
- From address: `SMTP_FROM_ADDRESS`
|
||||
- Implementation: `app/services/email_service.py`
|
||||
- Supports TLS, STARTTLS, plain auth
|
||||
|
||||
**WebSocket/SSH Tunneling:**
|
||||
- Browser-based SSH terminal for remote device access
|
||||
- SDK/Client: asyncssh (Python), xterm.js (frontend)
|
||||
- Protocol: SSH protocol with port forwarding
|
||||
- Implementation: `app/routers/remote_access.py`, `poller/internal/sshrelay/`
|
||||
- Features: Session auditing, command logging to NATS
|
||||
|
||||
## Data Storage
|
||||
|
||||
**Databases:**
|
||||
- PostgreSQL 17 (TimescaleDB extension in production)
|
||||
- Async driver: asyncpg 0.30.0+ (Python backend)
|
||||
- Sync driver: pgx/v5 (Go poller)
|
||||
- ORM: SQLAlchemy 2.0+ async
|
||||
- Migrations: Alembic 1.14.0+
|
||||
- RLS: Row-Level Security policies for multi-tenant isolation
|
||||
- Models: `app/models/` (17+ model files)
|
||||
- Connection: `DATABASE_URL`, `APP_USER_DATABASE_URL`, `POLLER_DATABASE_URL`
|
||||
- Admin role: postgres (migrations only)
|
||||
- App role: app_user (enforces RLS)
|
||||
- Poller role: poller_user (direct access, no RLS)
|
||||
|
||||
**File Storage:**
|
||||
- Local filesystem only - No cloud storage integration
|
||||
- Git store (bare repos): `/data/git-store` or `./git-store` (RWX PVC in production)
|
||||
- Implementation: `app/services/git_store.py`
|
||||
- Purpose: Version control for device configurations (one repo per tenant)
|
||||
- Firmware cache: `/data/firmware-cache`
|
||||
- Purpose: Downloaded RouterOS firmware images
|
||||
- Service: `app/services/firmware_service.py`
|
||||
- WireGuard config: `/data/wireguard`
|
||||
- Purpose: VPN peer and configuration management
|
||||
|
||||
**Caching:**
|
||||
- Redis 7+
|
||||
- Async driver: redis 5.0.0+ (Python)
|
||||
- Sync driver: redis/go-redis/v9 (Go)
|
||||
- Use cases:
|
||||
- Session storage for SRP auth flows: `app/routers/auth.py` (key: `srp:session:{session_id}`)
|
||||
- Distributed locks: poller uses `bsm/redislock` to prevent duplicate polls across replicas
|
||||
- Connection: `REDIS_URL`
|
||||
|
||||
## Authentication & Identity
|
||||
|
||||
**Auth Provider:**
|
||||
- Custom SRP-6a implementation (zero-knowledge auth)
|
||||
- Flow: SRP-6a password hash registration → no plaintext password stored
|
||||
- Implementation: `app/services/srp_service.py`, `app/routers/auth.py`
|
||||
- JWT tokens: HS256 signed with `JWT_SECRET_KEY`
|
||||
- Token storage: httpOnly cookies (frontend sends via credentials)
|
||||
- Refresh: 15-minute access tokens, 7-day refresh tokens
|
||||
- Fallback: Legacy bcrypt password support during upgrade phase
|
||||
|
||||
**User Roles:**
|
||||
- Four role levels with RBAC:
|
||||
- super_admin - Cross-tenant access, user/billing management
|
||||
- admin - Full tenant management (invite users, config push, firmware)
|
||||
- operator - Limited: config push, monitoring, alerts
|
||||
- viewer - Read-only: dashboard, reports, audit logs
|
||||
|
||||
**Credential Encryption:**
|
||||
- Per-tenant envelope encryption via OpenBao Transit
|
||||
- Service: `app/services/openbao_service.py`
|
||||
- Cipher: AES-256-GCM via OpenBao Transit engine
|
||||
- Key naming: `tenant_{uuid}` (created on tenant creation)
|
||||
- Fallback: Legacy Fernet decryption for credentials created before Transit migration
|
||||
|
||||
## Monitoring & Observability
|
||||
|
||||
**Error Tracking:**
|
||||
- Not integrated - No Sentry, DataDog, or equivalent
|
||||
- Local structured logging only
|
||||
|
||||
**Logs:**
|
||||
- Structured logging via structlog (Python backend)
|
||||
- Format: JSON (production), human-readable (dev)
|
||||
- Configuration: `app/logging_config.py`
|
||||
- Log level: Configurable via `LOG_LEVEL` env var
|
||||
- Structured logging via slog (Go poller)
|
||||
- Format: JSON with service name and instance hostname
|
||||
- Configuration: `poller/cmd/poller/main.go`
|
||||
|
||||
**Metrics:**
|
||||
- Prometheus metrics export
|
||||
- Library: prometheus-fastapi-instrumentator 7.0.0+
|
||||
- Setup: `app/observability.py`
|
||||
- Endpoint: Exposed metrics in Prometheus text format
|
||||
- Not scraped by default - requires external Prometheus instance
|
||||
|
||||
**OpenTelemetry:**
|
||||
- Minimal OTEL instrumentation in Go poller
|
||||
- SDK: `go.opentelemetry.io/otel` 1.39.0+
|
||||
- Not actively used in Python backend
|
||||
|
||||
## CI/CD & Deployment
|
||||
|
||||
**Hosting:**
|
||||
- Self-hosted (Docker Compose for local, Kubernetes for production)
|
||||
- No cloud provider dependency
|
||||
- Reverse proxy: Caddy (reference: user memory notes)
|
||||
|
||||
**CI Pipeline:**
|
||||
- GitHub Actions (`.github/workflows/`)
|
||||
- Not fully analyzed - check workflows for details
|
||||
|
||||
**Containers:**
|
||||
- Docker multi-stage builds for all three services
|
||||
- Images: `api` (FastAPI), `poller` (Go binary), `frontend` (Vite SPA)
|
||||
- Profiles: `full` (all services), `mail-testing` (adds Mailpit)
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
**Required env vars:**
|
||||
- `DATABASE_URL` - PostgreSQL admin connection
|
||||
- `SYNC_DATABASE_URL` - Alembic migrations connection
|
||||
- `APP_USER_DATABASE_URL` - App-scoped RLS connection
|
||||
- `POLLER_DATABASE_URL` - Poller service connection
|
||||
- `REDIS_URL` - Redis connection
|
||||
- `NATS_URL` - NATS JetStream connection
|
||||
- `JWT_SECRET_KEY` - HS256 signing key (MUST be unique in production)
|
||||
- `CREDENTIAL_ENCRYPTION_KEY` - Base64-encoded 32-byte AES key
|
||||
- `OPENBAO_ADDR` - OpenBao server address
|
||||
- `OPENBAO_TOKEN` - OpenBao authentication token
|
||||
- `CORS_ORIGINS` - Frontend origins (comma-separated)
|
||||
- `SMTP_HOST`, `SMTP_PORT` - Email server
|
||||
- `FIRST_ADMIN_EMAIL`, `FIRST_ADMIN_PASSWORD` - Bootstrap account (dev only)
|
||||
|
||||
**Secrets location:**
|
||||
- `.env` file (git-ignored) - Development
|
||||
- Environment variables in production (Kubernetes secrets, docker compose .env)
|
||||
- OpenBao - Stores Transit encryption keys (not key material, only key references)
|
||||
|
||||
**Security defaults validation:**
|
||||
- `app/config.py` rejects known-insecure values in non-dev environments:
|
||||
- `JWT_SECRET_KEY` hard-coded defaults
|
||||
- `CREDENTIAL_ENCRYPTION_KEY` hard-coded defaults
|
||||
- `OPENBAO_TOKEN` hard-coded defaults
|
||||
- Fails startup with clear error message if production uses dev secrets
|
||||
|
||||
## Webhooks & Callbacks
|
||||
|
||||
**Incoming:**
|
||||
- None detected - No external webhook subscriptions
|
||||
|
||||
**Outgoing:**
|
||||
- Slack notifications - Alert firing/resolution (planned/partial implementation)
|
||||
- Router: `app/routers/alerts.py`
|
||||
- Implementation status: Check alert evaluation service
|
||||
- Email notifications - Alert notifications, password reset
|
||||
- Service: `app/services/email_service.py`
|
||||
- Custom webhooks - Extensible via notification service
|
||||
- Service: `app/services/notification_service.py`
|
||||
|
||||
## NATS JetStream Event Bus
|
||||
|
||||
**Message Bus:**
|
||||
- NATS 2.0+ with JetStream persistence
|
||||
- Python client: nats-py 2.7.0+
|
||||
- Go client: nats.go 1.38.0+
|
||||
- Connection: `NATS_URL`
|
||||
|
||||
**Event Topics (Python publisher → Go/Python subscribers):**
|
||||
- `device.status.>` - Device online/offline status from Go poller
|
||||
- Subscriber: `app/services/nats_subscriber.py`
|
||||
- Payload: device_id, tenant_id, status, routeros_version, board_name, uptime
|
||||
- Usage: Real-time device fleet updates
|
||||
|
||||
- `firmware.progress.{tenant_id}.{device_id}` - Firmware upgrade progress
|
||||
- Subscriber: `app/services/firmware_subscriber.py`
|
||||
- Publisher: Firmware upgrade service
|
||||
- Payload: stage (downloading, verifying, upgrading), progress %, message
|
||||
- Usage: Live firmware upgrade tracking (SSE to frontend)
|
||||
|
||||
- `config.push.{tenant_id}.{device_id}` - Configuration push progress
|
||||
- Subscriber: `app/services/push_rollback_subscriber.py`
|
||||
- Publisher: `app/services/restore_service.py`
|
||||
- Payload: phase (pre-validate, backup, push, commit), status, errors
|
||||
- Usage: Live config deployment tracking with rollback support
|
||||
|
||||
- `alert.fired.{tenant_id}`, `alert.resolved.{tenant_id}` - Alert events
|
||||
- Subscriber: `app/services/sse_manager.py`
|
||||
- Publisher: `app/services/alert_evaluator.py`
|
||||
- Payload: alert_id, device_id, rule_name, condition, value, timestamp
|
||||
- Usage: Real-time alert notifications (SSE to frontend)
|
||||
|
||||
- `audit.session.end` - SSH session audit events
|
||||
- Subscriber: `app/services/session_audit_subscriber.py`
|
||||
- Publisher: Go SSH relay (`poller/internal/sshrelay/`)
|
||||
- Payload: session_id, user_id, device_id, start_time, end_time, command_log
|
||||
- Usage: Session auditing and compliance logging
|
||||
|
||||
- `config.change.{tenant_id}.{device_id}` - Device config change detection
|
||||
- Subscriber: `app/services/config_change_subscriber.py`
|
||||
- Payload: device_id, change_type, affected_subsystems, timestamp
|
||||
- Usage: Track unapproved config changes
|
||||
|
||||
- `metrics.sample.{tenant_id}.{device_id}` - Real-time CPU/memory/traffic samples
|
||||
- Subscriber: `app/services/metrics_subscriber.py`
|
||||
- Publisher: Go poller
|
||||
- Payload: timestamp, cpu_percent, memory_percent, disk_percent, interfaces{name, rx_bytes, tx_bytes}
|
||||
- Usage: Live metric streaming (SSE to frontend)
|
||||
|
||||
**Server-Sent Events (SSE):**
|
||||
- Frontend subscribes to per-tenant SSE streams
|
||||
- Endpoint: `GET /api/sse/subscribe?tenant_id={tenant_id}`
|
||||
- Connection: Long-lived HTTP persistent stream
|
||||
- Implementation: `app/routers/sse.py`, `app/services/sse_manager.py`
|
||||
- Payload format: SSE (text/event-stream)
|
||||
- Events forwarded from NATS to frontend browser in real-time
|
||||
- Used for: firmware progress, alerts, config push status, metrics
|
||||
|
||||
## Git Integration
|
||||
|
||||
**Version Control:**
|
||||
- Bare git repositories stored per-tenant
|
||||
- Library: pygit2 1.14.0+
|
||||
- Location: `{GIT_STORE_PATH}/tenant_{tenant_id}/`
|
||||
- Purpose: Store device configuration history
|
||||
- Commits created on: successful config push, manual save
|
||||
- Restore: One-click revert to any previous commit
|
||||
- Implementation: `app/services/git_store.py`
|
||||
|
||||
---
|
||||
|
||||
*Integration audit: 2026-03-12*
|
||||
Reference in New Issue
Block a user