docs: update all documentation for v9.7.0
- CONFIGURATION.md: fix database name (mikrotik → tod), add 5 missing env vars, update NATS memory to 256MB - API.md: add 8 missing endpoint groups (sites, sectors, wireless links, signal history, site alerts, config backups, remote access, winbox) - ARCHITECTURE.md: update subscriber count from 3 to 10, add v9.7 components (sites, sectors, link discovery, signal trending, site alerts), add background service loops, update router count to 33 - USER-GUIDE.md: add tower/site management, wireless links, signal history, site alerts, and fleet map documentation - README.md: add v9.7 features to feature list - DEPLOYMENT.md: add winbox-worker, openbao, wireguard to service list - SECURITY.md: add WinBox session security details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
147
docs/API.md
147
docs/API.md
@@ -45,7 +45,12 @@ All API routes are mounted under the `/api` prefix.
|
||||
| Device Tags | `/api/device-tags/*` | Tag-based device labeling |
|
||||
| Metrics | `/api/metrics/*` | TimescaleDB device metrics (CPU, memory, traffic, wireless) |
|
||||
| Wireless Issues | `/api/fleet/wireless-issues`, `/api/tenants/{id}/fleet/wireless-issues` | APs with degraded signal, CCQ, or dropped clients |
|
||||
| Config Backups | `/api/config-backups/*` | Automated RouterOS config backup history |
|
||||
| Sites | `/api/tenants/{id}/sites/*` | Site CRUD, device-to-site assignment |
|
||||
| Sectors | `/api/tenants/{id}/sites/{sid}/sectors/*` | Sector CRUD, device sector assignment |
|
||||
| Wireless Links | `/api/tenants/{id}/links`, `/api/tenants/{id}/devices/{did}/links` | Link listing, RF stats, registrations |
|
||||
| Signal History | `/api/tenants/{id}/devices/{did}/signal-history` | Per-client signal strength trending |
|
||||
| Site Alerts | `/api/tenants/{id}/sites/{sid}/alert-rules/*`, `/api/tenants/{id}/alert-events/*` | Site-scoped alert rules and events |
|
||||
| Config Backups | `/api/tenants/{id}/devices/{did}/config/*` | Config backup timeline, restore, schedules |
|
||||
| Config Editor | `/api/config-editor/*` | Live RouterOS config browsing and editing |
|
||||
| Firmware | `/api/firmware/*` | RouterOS firmware version management and upgrades |
|
||||
| Alerts | `/api/alerts/*` | Alert rule CRUD, alert history |
|
||||
@@ -59,6 +64,8 @@ All API routes are mounted under the `/api` prefix.
|
||||
| Reports | `/api/reports/*` | PDF report generation (Jinja2 + WeasyPrint) |
|
||||
| API Keys | `/api/api-keys/*` | API key CRUD |
|
||||
| Maintenance Windows | `/api/maintenance-windows/*` | Scheduled maintenance window management |
|
||||
| Remote Access | `/api/tenants/{id}/devices/{did}/*-session` | SSH terminal and WinBox tunnel sessions |
|
||||
| WinBox Remote | `/api/tenants/{id}/devices/{did}/winbox-remote-sessions/*` | Browser-based WinBox sessions (Xpra) |
|
||||
| VPN | `/api/vpn/*` | WireGuard VPN tunnel management |
|
||||
| Certificates | `/api/certificates/*` | Internal CA and device certificate management |
|
||||
| Transparency | `/api/transparency/*` | KMS access event dashboard |
|
||||
@@ -113,6 +120,144 @@ Endpoints enforce role-based access control. The four roles in descending privil
|
||||
| `operator` | Tenant | Device operations, config changes |
|
||||
| `viewer` | Tenant | Read-only access |
|
||||
|
||||
## Sites
|
||||
|
||||
Manage tower/site locations and assign devices to them.
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `GET` | `/api/tenants/{tenant_id}/sites` | viewer | List all sites with health rollup |
|
||||
| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}` | viewer | Get a single site with health rollup |
|
||||
| `POST` | `/api/tenants/{tenant_id}/sites` | operator | Create a site |
|
||||
| `PUT` | `/api/tenants/{tenant_id}/sites/{site_id}` | operator | Update a site |
|
||||
| `DELETE` | `/api/tenants/{tenant_id}/sites/{site_id}` | admin | Delete a site |
|
||||
| `POST` | `/api/tenants/{tenant_id}/sites/{site_id}/devices/{device_id}` | operator | Assign a device to a site |
|
||||
| `DELETE` | `/api/tenants/{tenant_id}/sites/{site_id}/devices/{device_id}` | operator | Remove a device from a site |
|
||||
| `POST` | `/api/tenants/{tenant_id}/sites/{site_id}/devices/bulk-assign` | operator | Bulk-assign devices to a site |
|
||||
|
||||
## Sectors
|
||||
|
||||
Manage radio sectors within a site and assign devices to them.
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/sectors` | viewer | List sectors for a site with device counts |
|
||||
| `POST` | `/api/tenants/{tenant_id}/sites/{site_id}/sectors` | operator | Create a sector |
|
||||
| `PUT` | `/api/tenants/{tenant_id}/sites/{site_id}/sectors/{sector_id}` | operator | Update a sector |
|
||||
| `DELETE` | `/api/tenants/{tenant_id}/sites/{site_id}/sectors/{sector_id}` | admin | Delete a sector |
|
||||
| `PUT` | `/api/tenants/{tenant_id}/devices/{device_id}/sector` | operator | Set or clear a device's sector assignment |
|
||||
|
||||
## Wireless Links
|
||||
|
||||
Read-only endpoints for wireless link topology, RF stats, and registrations.
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `GET` | `/api/tenants/{tenant_id}/links` | viewer | List all wireless links (optional `state` and `device_id` query filters) |
|
||||
| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/links` | viewer | List links where the device is AP or CPE |
|
||||
| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/links` | viewer | List links where either side belongs to the site |
|
||||
| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/registrations` | viewer | Latest wireless registration data per MAC |
|
||||
| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/rf-stats` | viewer | Latest RF monitor stats per interface |
|
||||
| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/unknown-clients` | viewer | Wireless clients whose MAC doesn't match any known device |
|
||||
|
||||
## Signal History
|
||||
|
||||
Time-bucketed signal strength trending for wireless clients.
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/signal-history` | viewer | Get signal history for a client MAC |
|
||||
|
||||
Query parameters:
|
||||
|
||||
- `mac_address` (required) -- client MAC address
|
||||
- `range` -- time range: `24h`, `7d`, or `30d` (default `7d`)
|
||||
|
||||
## Site Alerts
|
||||
|
||||
Site-scoped alert rules and alert events.
|
||||
|
||||
### Alert Rules
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules` | viewer | List alert rules (optional `sector_id` filter) |
|
||||
| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules/{rule_id}` | viewer | Get a single alert rule |
|
||||
| `POST` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules` | operator | Create an alert rule |
|
||||
| `PUT` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules/{rule_id}` | operator | Update an alert rule |
|
||||
| `DELETE` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules/{rule_id}` | operator | Delete an alert rule |
|
||||
|
||||
### Alert Events
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-events` | viewer | List alert events (optional `state` filter, `limit` up to 200) |
|
||||
| `POST` | `/api/tenants/{tenant_id}/alert-events/{event_id}/resolve` | operator | Resolve an active alert event |
|
||||
| `GET` | `/api/tenants/{tenant_id}/alert-events/count` | viewer | Active alert event count (notification badge) |
|
||||
|
||||
## Config Backups
|
||||
|
||||
Device config backup timeline, restore, and schedule management. All routes are scoped under `/api/tenants/{tenant_id}/devices/{device_id}/config/`.
|
||||
|
||||
### Backup Timeline
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `GET` | `.../config/backups` | viewer | List backup timeline for a device (newest first) |
|
||||
| `POST` | `.../config/backups` | operator | Trigger a manual config backup |
|
||||
| `POST` | `.../config/checkpoint` | operator | Create a checkpoint (named restore point) |
|
||||
| `GET` | `.../config/backups/{commit_sha}/export` | viewer | Download export.rsc text for a backup version |
|
||||
| `GET` | `.../config/backups/{commit_sha}/binary` | viewer | Download backup.bin for a backup version |
|
||||
|
||||
### Restore
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `POST` | `.../config/preview-restore` | operator | Preview impact analysis before restoring a config version |
|
||||
| `POST` | `.../config/restore` | operator | Restore a config version (two-phase push with panic-revert) |
|
||||
| `POST` | `.../config/emergency-rollback` | operator | Rollback to most recent pre-push backup |
|
||||
|
||||
### Schedules
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `GET` | `.../config/schedules` | viewer | Get effective backup schedule (device override or tenant default) |
|
||||
| `PUT` | `.../config/schedules` | operator | Create or update device-specific schedule override |
|
||||
|
||||
### Config Snapshot
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `POST` | `.../config-snapshot/trigger` | operator | Trigger immediate config snapshot via the Go poller (NATS) |
|
||||
|
||||
## Remote Access
|
||||
|
||||
SSH terminal and WinBox tunnel sessions. All routes are scoped under `/api/tenants/{tenant_id}/devices/{device_id}/`. Requires operator role or above.
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `POST` | `.../winbox-session` | operator | Open a WinBox tunnel (returns tunnel_id, host, port, winbox:// URI) |
|
||||
| `DELETE` | `.../winbox-session/{tunnel_id}` | operator | Close a WinBox tunnel (idempotent) |
|
||||
| `POST` | `.../ssh-session` | operator | Create a single-use SSH WebSocket session token (120s TTL) |
|
||||
| `GET` | `.../sessions` | operator | List active WinBox tunnels and remote sessions for a device |
|
||||
|
||||
The SSH session token authorises a subsequent WebSocket connection at `/ws/ssh?token=<token>`.
|
||||
|
||||
## WinBox Remote (Browser)
|
||||
|
||||
Xpra-based in-browser WinBox sessions. All routes are scoped under `/api/tenants/{tenant_id}/devices/{device_id}/winbox-remote-sessions/`. Requires operator role or above.
|
||||
|
||||
| Method | Endpoint | RBAC | Description |
|
||||
|--------|----------|------|-------------|
|
||||
| `POST` | `.../winbox-remote-sessions` | operator | Create a browser WinBox session |
|
||||
| `GET` | `.../winbox-remote-sessions` | operator | List active sessions for a device |
|
||||
| `GET` | `.../winbox-remote-sessions/{session_id}` | operator | Get session status |
|
||||
| `DELETE` | `.../winbox-remote-sessions/{session_id}` | operator | Terminate a session (idempotent) |
|
||||
| `GET` | `.../winbox-remote-sessions/{session_id}/xpra/{path}` | operator | Proxy Xpra HTML5 client files |
|
||||
| `WS` | `.../winbox-remote-sessions/{session_id}/ws` | operator | WebSocket proxy (browser to Xpra worker) |
|
||||
|
||||
Session creation returns a `websocket_path` for the Xpra WebSocket connection. Sessions enforce idle timeout (default 600s) and max lifetime (default 7200s).
|
||||
|
||||
## Multi-Tenancy
|
||||
|
||||
Tenant isolation is enforced at the database level via PostgreSQL Row-Level Security (RLS). The `app_user` database role automatically filters all queries by the authenticated user's `tenant_id`. Super admins operate outside tenant scope.
|
||||
|
||||
@@ -44,10 +44,24 @@ TOD (The Other Dude) is a containerized MSP fleet management platform for MikroT
|
||||
- `admin_engine` (superuser) -- used only for auth/bootstrap and NATS subscribers that need cross-tenant access
|
||||
- `app_engine` (non-superuser `app_user` role) -- used for all device/data routes, enforces RLS
|
||||
- **Authentication**: JWT tokens (15min access, 7d refresh), SRP-6a zero-knowledge proof, RBAC (super_admin, admin, operator, viewer)
|
||||
- **NATS subscribers**: Three independent subscribers for device status, metrics, and firmware events. Non-fatal startup -- API serves requests even if NATS is unavailable
|
||||
- **Background services**: APScheduler for nightly config backups and daily firmware version checks
|
||||
- **NATS subscribers**: Ten independent subscribers, each on its own NATS connection. Non-fatal startup -- API serves requests even if NATS is unavailable:
|
||||
- `nats_subscriber` -- device status events
|
||||
- `metrics_subscriber` -- device metrics (CPU, memory, interface counters)
|
||||
- `firmware_subscriber` -- firmware version events
|
||||
- `session_audit_subscriber` -- SSH session auditing
|
||||
- `config_change_subscriber` -- event-driven config backups
|
||||
- `push_rollback_subscriber` -- config push rollback and alerting
|
||||
- `config_snapshot_subscriber` -- config snapshot ingestion (Go poller -> PostgreSQL via Transit encryption)
|
||||
- `wireless_registration_subscriber` -- per-client wireless registration data
|
||||
- `interface_subscriber` -- device interface MAC resolution for link discovery
|
||||
- `link_discovery_subscriber` -- wireless link state machine (MAC-based AP/CPE pairing)
|
||||
- **Background services**:
|
||||
- APScheduler: nightly config backups, daily firmware version checks, retention cleanup (24h cycle)
|
||||
- WinBox session reconciliation loop (60s cycle) -- detects orphaned sessions and cleans up Redis + tunnels
|
||||
- Signal trend detection loop (hourly) -- identifies sustained signal degradation across wireless clients
|
||||
- Site alert evaluation loop (5-minute cycle) -- evaluates geographic-scoped alert rules with hysteresis
|
||||
- **OpenBao integration**: Provisions per-tenant Transit encryption keys on startup, dual-read fallback if OpenBao is unavailable
|
||||
- **Startup sequence**: Configure logging -> Run Alembic migrations -> Bootstrap first admin -> Start NATS subscribers -> Ensure SSE streams -> Start schedulers -> Provision OpenBao keys
|
||||
- **Startup sequence**: Configure logging -> Run Alembic migrations -> Bootstrap first admin -> Start NATS subscribers (10) -> Ensure SSE streams -> Start schedulers -> Provision OpenBao keys -> Recover stale push operations -> Start background loops (reconciliation, trend detection, site alerts)
|
||||
- **API documentation**: OpenAPI docs at `/docs` and `/redoc` (dev environment only)
|
||||
- **Health endpoints**: `/health` (liveness), `/health/ready` (readiness -- checks PostgreSQL, Redis, NATS)
|
||||
- **Middleware stack** (LIFO order): RequestID -> SecurityHeaders -> RateLimiting -> CORS -> Route handler
|
||||
@@ -55,7 +69,7 @@ TOD (The Other Dude) is a containerized MSP fleet management platform for MikroT
|
||||
|
||||
#### API Routers
|
||||
|
||||
The backend exposes 25 route groups under the `/api` prefix:
|
||||
The backend exposes 33 route groups under the `/api` prefix:
|
||||
|
||||
| Router | Purpose |
|
||||
|--------|---------|
|
||||
@@ -84,6 +98,14 @@ The backend exposes 25 route groups under the `/api` prefix:
|
||||
| `certificates` | Internal CA and device TLS certificates |
|
||||
| `settings` | System settings (SMTP configuration, super_admin only) |
|
||||
| `transparency` | KMS access event dashboard |
|
||||
| `remote_access` | SSH remote access sessions |
|
||||
| `winbox_remote` | WinBox browser-based remote sessions |
|
||||
| `sites` | Site management (hierarchical device organization) |
|
||||
| `sectors` | Sector definitions within sites (antenna/coverage zones) |
|
||||
| `links` | Wireless link discovery and state tracking |
|
||||
| `signal_history` | Per-client signal strength history and trends |
|
||||
| `site_alerts` | Geographic-scoped alert rules and events |
|
||||
| `config` | Config push operations (two-phase with panic revert) |
|
||||
|
||||
### Go Poller
|
||||
|
||||
@@ -135,7 +157,7 @@ The backend exposes 25 route groups under the `/api` prefix:
|
||||
- **Durable consumers**: Ensure no message loss during API restarts
|
||||
- **Monitoring port**: 8222
|
||||
- **Data volume**: `./docker-data/nats`
|
||||
- **Memory limit**: 128MB
|
||||
- **Memory limit**: 256MB
|
||||
|
||||
### OpenBao (HashiCorp Vault fork)
|
||||
|
||||
@@ -245,6 +267,48 @@ Browser API PostgreSQL
|
||||
- `poller_user` bypasses RLS intentionally (needs cross-tenant device access for polling)
|
||||
- Tenant isolation is enforced at the database level, not the application level -- even a compromised API cannot leak cross-tenant data through `app_user` connections
|
||||
|
||||
## Sites & Sectors
|
||||
|
||||
The site management subsystem provides hierarchical device organization for tower-based wireless deployments.
|
||||
|
||||
- **Sites**: Named geographic locations (towers, POPs, huts) with optional latitude/longitude coordinates
|
||||
- **Sectors**: Coverage zones within a site, representing individual antenna faces or radio segments. Each sector belongs to exactly one site and can have one or more devices assigned
|
||||
- **Device assignment**: Devices are assigned to sectors, inheriting site membership. A device belongs to at most one sector at a time
|
||||
- **Site health**: Aggregate health status is derived from the devices within a site's sectors -- if any device is down, the site status reflects it
|
||||
|
||||
## Wireless Link Discovery
|
||||
|
||||
MAC-based automatic detection of AP-to-CPE wireless links.
|
||||
|
||||
- **Interface subscriber**: Ingests device interface data from NATS, building a MAC-to-device lookup table
|
||||
- **Wireless registration subscriber**: Processes per-client wireless registration events, capturing connected MACs and signal data
|
||||
- **Link discovery subscriber**: Correlates AP registration tables with CPE interface MACs to identify links between managed devices
|
||||
- **State machine**: Each discovered link transitions through states based on signal quality and reachability:
|
||||
- `discovered` -- initial detection, not yet confirmed
|
||||
- `active` -- confirmed bidirectional link with acceptable signal
|
||||
- `degraded` -- signal below threshold or intermittent connectivity
|
||||
- `down` -- link lost (device unreachable or deregistered)
|
||||
- `stale` -- no update received within the retention window
|
||||
- **Automatic pairing**: When an AP's registration table contains a MAC belonging to a managed CPE, a link record is created without manual configuration
|
||||
|
||||
## Signal History & Trend Detection
|
||||
|
||||
Per-client signal strength tracking with automatic degradation alerting.
|
||||
|
||||
- **Signal history**: Records signal strength samples for each wireless client over time, stored in TimescaleDB for efficient time-range queries
|
||||
- **Trend detection loop** (hourly): Analyzes recent signal history to identify sustained degradation. When a client's signal drops below threshold for a configurable window, the system creates a site alert event with rule type `signal_degradation`. Auto-resolves when signal recovers
|
||||
- **Retention**: Signal history samples are subject to the same retention cleanup as other time-series data
|
||||
|
||||
## Site Alert Rules
|
||||
|
||||
Geographic-scoped alerting distinct from per-device alerts.
|
||||
|
||||
- **Rule types**: Configurable rules scoped to a site (e.g., "alert when more than N devices are down at site X", signal degradation thresholds)
|
||||
- **Evaluation loop** (5-minute cycle): Evaluates all enabled site alert rules against current data
|
||||
- **Hysteresis**: Rules require consecutive hits (default 2) before confirming an alert, preventing flapping from transient conditions
|
||||
- **Event lifecycle**: Alert events are created when rules trigger and auto-resolved when conditions clear. Manual resolution is also supported
|
||||
- **Separation from device alerts**: Site alerts operate independently from the per-device alert system, allowing operators to set geographic thresholds without duplicating device-level rules
|
||||
|
||||
## Security Layers
|
||||
|
||||
| Layer | Mechanism | Purpose |
|
||||
@@ -285,7 +349,7 @@ backend/ FastAPI Python backend
|
||||
config.py Pydantic Settings configuration
|
||||
database.py SQLAlchemy engines (admin + app_user)
|
||||
models/ SQLAlchemy ORM models
|
||||
routers/ FastAPI route handlers (25 modules)
|
||||
routers/ FastAPI route handlers (33 modules)
|
||||
services/ Business logic, NATS subscribers, schedulers
|
||||
middleware/ Rate limiting, request ID, security headers
|
||||
frontend/ React TypeScript frontend
|
||||
@@ -332,6 +396,6 @@ docker compose build frontend
|
||||
| Go Poller | 512MB |
|
||||
| OpenBao | 256MB |
|
||||
| Redis | 128MB |
|
||||
| NATS | 128MB |
|
||||
| NATS | 256MB |
|
||||
| WireGuard | 128MB |
|
||||
| Frontend (nginx) | 64MB |
|
||||
|
||||
@@ -29,11 +29,12 @@ TOD uses Pydantic Settings for configuration. All values can be set via environm
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `DATABASE_URL` | `postgresql+asyncpg://postgres:postgres@localhost:5432/mikrotik` | Admin (superuser) async database URL. Used for migrations and bootstrap operations. |
|
||||
| `SYNC_DATABASE_URL` | `postgresql+psycopg2://postgres:postgres@localhost:5432/mikrotik` | Synchronous database URL used by Alembic migrations only. |
|
||||
| `APP_USER_DATABASE_URL` | `postgresql+asyncpg://app_user:app_password@localhost:5432/mikrotik` | Non-superuser async database URL. Enforces PostgreSQL RLS for tenant isolation. |
|
||||
| `DATABASE_URL` | `postgresql+asyncpg://postgres:postgres@localhost:5432/tod` | Admin (superuser) async database URL. Used for migrations and bootstrap operations. |
|
||||
| `SYNC_DATABASE_URL` | `postgresql+psycopg2://postgres:postgres@localhost:5432/tod` | Synchronous database URL used by Alembic migrations only. |
|
||||
| `APP_USER_DATABASE_URL` | `postgresql+asyncpg://app_user:app_password@localhost:5432/tod` | Non-superuser async database URL. Enforces PostgreSQL RLS for tenant isolation. |
|
||||
| `DB_POOL_SIZE` | `20` | App user connection pool size |
|
||||
| `DB_MAX_OVERFLOW` | `40` | App user pool max overflow connections |
|
||||
| `DB_POOL_RECYCLE` | `1847` | Connection pool recycle time in seconds |
|
||||
| `DB_ADMIN_POOL_SIZE` | `10` | Admin connection pool size |
|
||||
| `DB_ADMIN_MAX_OVERFLOW` | `20` | Admin pool max overflow connections |
|
||||
|
||||
@@ -82,6 +83,20 @@ OpenBao is the key management service used to encrypt device credentials on a pe
|
||||
| `FIRMWARE_CACHE_DIR` | `/data/firmware-cache` | Path to firmware download cache (PVC mount in production) |
|
||||
| `FIRMWARE_CHECK_INTERVAL_HOURS` | `24` | Hours between automatic RouterOS version checks |
|
||||
|
||||
### Signal Trending & Site Alerting
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `SIGNAL_DEGRADATION_THRESHOLD_DB` | `5` | Signal degradation threshold in dB for trend detection |
|
||||
| `ALERT_EVALUATION_INTERVAL_SECONDS` | `300` | How often site alert rules are evaluated |
|
||||
| `TREND_DETECTION_INTERVAL_SECONDS` | `3600` | How often signal trending analysis runs |
|
||||
|
||||
### Retention
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `CONFIG_RETENTION_DAYS` | `90` | How long config snapshots are retained |
|
||||
|
||||
### Storage Paths
|
||||
|
||||
| Variable | Default | Description |
|
||||
@@ -141,7 +156,7 @@ All containers have enforced memory limits to prevent OOM on the host:
|
||||
|---------|-------------|
|
||||
| PostgreSQL | 512 MB |
|
||||
| Redis | 128 MB |
|
||||
| NATS | 128 MB |
|
||||
| NATS | 256 MB |
|
||||
| API | 512 MB |
|
||||
| Poller | 256 MB |
|
||||
| Frontend | 64 MB |
|
||||
|
||||
@@ -12,6 +12,9 @@ TOD (The Other Dude) is a containerized fleet management platform for RouterOS d
|
||||
- **PostgreSQL + TimescaleDB** -- Primary database with time-series extensions
|
||||
- **Redis** -- Distributed locking and rate limiting
|
||||
- **NATS JetStream** -- Message bus for device events
|
||||
- **OpenBao** -- Secrets management (Transit encryption for credentials, config backups, audit logs)
|
||||
- **WireGuard** -- VPN gateway for isolated device networks
|
||||
- **WinBox Worker** -- Xpra-based container for browser WinBox sessions (runs on linux/amd64, 1GB memory limit)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
@@ -159,6 +162,9 @@ Container memory limits are enforced in `docker-compose.prod.yml` to prevent OOM
|
||||
| API | 512MB |
|
||||
| Poller | 512MB |
|
||||
| Frontend | 64MB |
|
||||
| OpenBao | 256MB |
|
||||
| WireGuard | 128MB |
|
||||
| WinBox Worker | 1GB |
|
||||
|
||||
Adjust under `deploy.resources.limits.memory` in `docker-compose.prod.yml`.
|
||||
|
||||
@@ -238,6 +244,7 @@ The Helm chart deploys:
|
||||
| Frontend | Deployment | React SPA (nginx) |
|
||||
| Poller | Deployment | Go device poller |
|
||||
| WireGuard | Deployment | VPN gateway |
|
||||
| WinBox Worker | Deployment | Browser-based WinBox sessions (Xpra) |
|
||||
|
||||
### Configuration
|
||||
|
||||
|
||||
@@ -25,7 +25,11 @@ The Other Dude is a self-hosted, multi-tenant platform (one installation serves
|
||||
- **Dashboard** -- At-a-glance fleet health with device counts, uptime sparklines, status breakdowns per organization, and an "APs Needing Attention" card highlighting wireless issues.
|
||||
- **Device Management** -- Detailed device pages with system info, interfaces, routes, firewall rules, DHCP leases, and real-time resource metrics.
|
||||
- **Fleet Table** -- Virtual-scrolled table that handles hundreds of devices without breaking a sweat.
|
||||
- **Device Map** -- Geographic view of device locations.
|
||||
- **Tower & Site Management** -- Organize devices by physical location. Sites represent towers or equipment rooms; sectors subdivide them by antenna direction with azimuth bearings. Health grid shows per-device CPU, memory, and uptime at a glance.
|
||||
- **Wireless Link Discovery** -- Automatic AP-to-CPE link detection with real-time signal strength, CCQ, TX/RX rates, and a five-state health model (discovered, active, degraded, down, stale).
|
||||
- **Signal History & Trend Detection** -- Per-client signal history charts with min/avg/max trends over 24-hour, 7-day, and 30-day windows. Color-banded thresholds highlight degradation at a glance.
|
||||
- **Site-Level Alert Rules** -- Threshold-based alerts scoped to sites and sectors: device offline percentage, sector signal average, client drop detection, and signal degradation.
|
||||
- **Fleet Map** -- Geographic map with status-colored markers and automatic clustering. Cluster colors reflect aggregate device health across a region.
|
||||
- **Subnet Scanner** -- Discover new RouterOS devices on your network and onboard them in clicks.
|
||||
|
||||
### Configuration
|
||||
|
||||
@@ -96,6 +96,7 @@ TOD includes on-demand WinBox tunnels and browser-based SSH terminals for device
|
||||
- **Audit trail:** Tunnel open/close events and SSH session start/end events are recorded in the immutable audit log with device ID, user ID, source IP, and timestamp.
|
||||
- **WinBox tunnel binding:** TCP proxies for WinBox connections are bound to `127.0.0.1` only. Tunnels are never exposed on `0.0.0.0` and cannot be reached from outside the host without explicit port forwarding.
|
||||
- **Idle-timeout cleanup:** Inactive tunnels are closed automatically after `TUNNEL_IDLE_TIMEOUT` seconds (default 300). SSH sessions time out after `SSH_IDLE_TIMEOUT` seconds (default 900). Resources are reclaimed immediately on disconnect.
|
||||
- **WinBox Browser sessions:** WinBox sessions use single-use session IDs stored in Redis with a short TTL. The browser connects via a WebSocket proxy -- never directly to the device. Sessions follow a strict lifecycle (`creating` -> `active` -> `grace` -> `terminated`) with automatic cleanup at each stage. Device credentials are decrypted server-side via the OpenBao Transit engine and are never sent to the browser. Session creation is rate-limited to 3 requests per 5 minutes per user.
|
||||
|
||||
## Network Security
|
||||
|
||||
|
||||
@@ -36,7 +36,9 @@ TOD uses a collapsible sidebar with four sections. Press `[` to toggle the sideb
|
||||
|------|-------------|
|
||||
| **Dashboard** | Overview of your fleet with device status cards, active alerts, metrics sparklines, and "APs Needing Attention" wireless health card. The landing page after login. |
|
||||
| **Devices** | Fleet table with search, sort, and filter. Click any device row to open its detail page. |
|
||||
| **Map** | Geographic map view of device locations. |
|
||||
| **Sites** | Tower and site management -- organize devices by physical location with sectors, health monitoring, wireless links, and site-scoped alerts. |
|
||||
| **Wireless Links** | Fleet-wide view of all discovered AP-to-CPE wireless connections with signal, CCQ, TX/RX rates, and link state. |
|
||||
| **Map** | Geographic fleet map with status-colored markers and automatic clustering. Devices with coordinates appear on the map; clusters reflect aggregate health (green = all online, red = all offline, amber = mixed). |
|
||||
|
||||
### Manage
|
||||
|
||||
@@ -236,6 +238,138 @@ TOD supports dark and light modes:
|
||||
|
||||
---
|
||||
|
||||
## Tower & Site Management
|
||||
|
||||
Sites represent physical locations in your network -- towers, rooftops, equipment rooms, or any place where you deploy devices. Sectors let you subdivide a site by antenna direction. Together they give you a structured view of your wireless infrastructure.
|
||||
|
||||
### Creating a Site
|
||||
|
||||
1. Navigate to **Fleet > Sites** in the sidebar.
|
||||
2. Click **New Site**.
|
||||
3. Fill in the site details:
|
||||
- **Name** (required) -- a descriptive label for the location (e.g., "North Ridge Tower").
|
||||
- **Address** -- street address or landmark description.
|
||||
- **Latitude / Longitude** -- GPS coordinates. Devices at this site inherit these coordinates on the fleet map.
|
||||
- **Elevation** -- tower or rooftop height in meters.
|
||||
- **Notes** -- free-text field for internal reference.
|
||||
4. Click **Create Site**.
|
||||
|
||||
The Sites list shows all sites with search filtering. Click any site to open its detail page.
|
||||
|
||||
### Site Detail Page
|
||||
|
||||
The site detail page shows a summary header with device count, online count, online percentage, and active alert count. Four tabs provide deeper views:
|
||||
|
||||
| Tab | Description |
|
||||
|-----|-------------|
|
||||
| **Health Grid** | Card grid of every device assigned to the site showing live CPU, memory, and uptime. Cards are color-coded by status (green = online, red = offline). Click any card to open the device detail page. |
|
||||
| **Sectors** | Sector-based view of devices and their connected CPE clients. Shows per-sector aggregate stats (client count, average signal, link count). |
|
||||
| **Links** | Table of all wireless links at the site, grouped by AP, with signal strength, CCQ, TX/RX rates, link state, and expandable signal history charts. |
|
||||
| **Alerts** | Site-scoped alert rules and alert event history. Create and manage rules that apply to this specific site or sector. |
|
||||
|
||||
### Creating Sectors
|
||||
|
||||
Sectors organize access points within a site by antenna direction (e.g., "North 0-120" or "South Sector"). To create a sector:
|
||||
|
||||
1. Open a site detail page and switch to the **Sectors** tab.
|
||||
2. Click **Add Sector**.
|
||||
3. Enter:
|
||||
- **Name** (required) -- a label for the sector direction (e.g., "North Sector").
|
||||
- **Azimuth** -- compass bearing in degrees (0-360) representing the antenna direction. 0 is north, 90 is east, 180 is south, 270 is west.
|
||||
- **Description** -- optional notes about the sector.
|
||||
4. Click **Create Sector**.
|
||||
|
||||
Each sector section is collapsible and shows a header with device count, connected client count, average signal strength, and link count. Devices within a sector are listed with their connected CPEs and link states inline.
|
||||
|
||||
### Assigning Devices to Sites and Sectors
|
||||
|
||||
Devices are assigned to a site from the device detail page or from the Sites section. Once assigned, you can further assign a device to a specific sector:
|
||||
|
||||
1. Open the site detail page and switch to the **Sectors** tab.
|
||||
2. Each device row has a sector assignment dropdown on the right.
|
||||
3. Select a sector from the dropdown to assign the device, or select **Unassigned** to remove the sector assignment.
|
||||
|
||||
Devices that belong to a site but have no sector assignment appear in the **Unassigned** section at the bottom of the Sectors tab.
|
||||
|
||||
---
|
||||
|
||||
## Wireless Links
|
||||
|
||||
TOD automatically discovers wireless connections between access points (APs) and client premise equipment (CPEs) in your fleet. When the poller detects a registration table entry on an AP that matches a CPE device in your fleet, it creates a wireless link record.
|
||||
|
||||
### Link States
|
||||
|
||||
Each wireless link has a state that reflects its current health:
|
||||
|
||||
| State | Meaning |
|
||||
|-------|---------|
|
||||
| **Discovered** | A new AP-CPE connection has been detected for the first time. |
|
||||
| **Active** | The link is up with recent poll data confirming connectivity. |
|
||||
| **Degraded** | The link is connected but signal or quality metrics have dropped below healthy thresholds. |
|
||||
| **Down** | The link has not been seen in recent polls -- the CPE is likely disconnected. |
|
||||
| **Stale** | The link has not been seen for an extended period. The connection may no longer exist. |
|
||||
|
||||
Link states transition automatically based on poll results and missed-poll counters.
|
||||
|
||||
### Viewing Wireless Links
|
||||
|
||||
There are two ways to view wireless links:
|
||||
|
||||
- **Fleet-wide**: Navigate to **Fleet > Wireless Links** in the sidebar. This shows all discovered links across your organization, filterable by state (active, degraded, down, stale).
|
||||
- **Per-site**: Open a site detail page and switch to the **Links** tab. This shows only the links associated with devices assigned to that site.
|
||||
|
||||
Both views group links by AP device. Each CPE row shows signal strength (dBm), CCQ percentage, TX/RX data rates, link state, and time since last seen.
|
||||
|
||||
### Signal History
|
||||
|
||||
Click any CPE row in the wireless links table to expand an inline signal history chart. The chart shows signal strength over time with three lines:
|
||||
|
||||
- **Average signal** (solid blue) -- the primary trend line.
|
||||
- **Min / Max signal** (dashed) -- the range boundaries.
|
||||
|
||||
The background is color-banded: green for strong signal (above -65 dBm), yellow for moderate (-65 to -80 dBm), and red for weak (below -80 dBm).
|
||||
|
||||
Use the time range selector in the chart header to switch between **24h**, **7d**, and **30d** views. This helps you spot intermittent degradation, seasonal patterns, or gradual signal drift that might not be obvious from a single snapshot.
|
||||
|
||||
---
|
||||
|
||||
## Site Alerts
|
||||
|
||||
Site alert rules let you define thresholds scoped to an entire site or a specific sector, rather than individual devices. This is useful for detecting systemic issues across a tower location.
|
||||
|
||||
### Creating a Site Alert Rule
|
||||
|
||||
1. Open the site detail page and switch to the **Alerts** tab.
|
||||
2. Click **Add Alert Rule**.
|
||||
3. Configure the rule:
|
||||
- **Rule type** -- choose from:
|
||||
- *Device Offline Percent* -- fires when the percentage of offline devices at the site exceeds the threshold.
|
||||
- *Device Offline Count* -- fires when a specific number of devices go offline.
|
||||
- *Sector Signal Average* -- fires when the average signal strength across a sector drops below the threshold.
|
||||
- *Sector Client Drop* -- fires when the number of connected clients in a sector drops by more than the threshold.
|
||||
- *Signal Degradation* -- fires when individual link signal degrades past a threshold.
|
||||
- **Scope** -- apply the rule to the entire site or narrow it to a specific sector.
|
||||
- **Threshold** -- the numeric value and unit that triggers the alert.
|
||||
- **Severity** -- warning or critical.
|
||||
4. Click **Create Rule**.
|
||||
|
||||
Alert events appear in the site's Alerts tab with timestamps, severity, the triggering message, and consecutive hit count. Active alerts can be resolved manually by operators.
|
||||
|
||||
---
|
||||
|
||||
## Fleet Map
|
||||
|
||||
The fleet map provides a geographic view of all devices that have coordinates assigned (either directly on the device or inherited from their site).
|
||||
|
||||
- Navigate to **Fleet > Map** in the sidebar.
|
||||
- Devices appear as color-coded markers: **green** for online, **red** for offline.
|
||||
- When devices are geographically close, they automatically cluster into numbered circles. Cluster color reflects aggregate health: green if all devices in the cluster are online, red if all are offline, and amber if mixed.
|
||||
- Click a cluster to zoom in and see individual markers. Click a device marker to see its status summary and link to its detail page.
|
||||
- Super admins can filter the map by organization using the dropdown in the toolbar.
|
||||
- The map auto-fits to show all mapped devices when loaded. The toolbar shows how many of your devices have coordinates assigned.
|
||||
|
||||
---
|
||||
|
||||
## Tips
|
||||
|
||||
- Use the **command palette** (`Cmd+K`) for the fastest way to navigate. It searches pages, devices, and actions.
|
||||
|
||||
Reference in New Issue
Block a user