diff --git a/docs/API.md b/docs/API.md index a1e04f6..8fcb072 100644 --- a/docs/API.md +++ b/docs/API.md @@ -45,7 +45,12 @@ All API routes are mounted under the `/api` prefix. | Device Tags | `/api/device-tags/*` | Tag-based device labeling | | Metrics | `/api/metrics/*` | TimescaleDB device metrics (CPU, memory, traffic, wireless) | | Wireless Issues | `/api/fleet/wireless-issues`, `/api/tenants/{id}/fleet/wireless-issues` | APs with degraded signal, CCQ, or dropped clients | -| Config Backups | `/api/config-backups/*` | Automated RouterOS config backup history | +| Sites | `/api/tenants/{id}/sites/*` | Site CRUD, device-to-site assignment | +| Sectors | `/api/tenants/{id}/sites/{sid}/sectors/*` | Sector CRUD, device sector assignment | +| Wireless Links | `/api/tenants/{id}/links`, `/api/tenants/{id}/devices/{did}/links` | Link listing, RF stats, registrations | +| Signal History | `/api/tenants/{id}/devices/{did}/signal-history` | Per-client signal strength trending | +| Site Alerts | `/api/tenants/{id}/sites/{sid}/alert-rules/*`, `/api/tenants/{id}/alert-events/*` | Site-scoped alert rules and events | +| Config Backups | `/api/tenants/{id}/devices/{did}/config/*` | Config backup timeline, restore, schedules | | Config Editor | `/api/config-editor/*` | Live RouterOS config browsing and editing | | Firmware | `/api/firmware/*` | RouterOS firmware version management and upgrades | | Alerts | `/api/alerts/*` | Alert rule CRUD, alert history | @@ -59,6 +64,8 @@ All API routes are mounted under the `/api` prefix. | Reports | `/api/reports/*` | PDF report generation (Jinja2 + WeasyPrint) | | API Keys | `/api/api-keys/*` | API key CRUD | | Maintenance Windows | `/api/maintenance-windows/*` | Scheduled maintenance window management | +| Remote Access | `/api/tenants/{id}/devices/{did}/*-session` | SSH terminal and WinBox tunnel sessions | +| WinBox Remote | `/api/tenants/{id}/devices/{did}/winbox-remote-sessions/*` | Browser-based WinBox sessions (Xpra) | | VPN | `/api/vpn/*` | WireGuard VPN tunnel management | | Certificates | `/api/certificates/*` | Internal CA and device certificate management | | Transparency | `/api/transparency/*` | KMS access event dashboard | @@ -113,6 +120,144 @@ Endpoints enforce role-based access control. The four roles in descending privil | `operator` | Tenant | Device operations, config changes | | `viewer` | Tenant | Read-only access | +## Sites + +Manage tower/site locations and assign devices to them. + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `GET` | `/api/tenants/{tenant_id}/sites` | viewer | List all sites with health rollup | +| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}` | viewer | Get a single site with health rollup | +| `POST` | `/api/tenants/{tenant_id}/sites` | operator | Create a site | +| `PUT` | `/api/tenants/{tenant_id}/sites/{site_id}` | operator | Update a site | +| `DELETE` | `/api/tenants/{tenant_id}/sites/{site_id}` | admin | Delete a site | +| `POST` | `/api/tenants/{tenant_id}/sites/{site_id}/devices/{device_id}` | operator | Assign a device to a site | +| `DELETE` | `/api/tenants/{tenant_id}/sites/{site_id}/devices/{device_id}` | operator | Remove a device from a site | +| `POST` | `/api/tenants/{tenant_id}/sites/{site_id}/devices/bulk-assign` | operator | Bulk-assign devices to a site | + +## Sectors + +Manage radio sectors within a site and assign devices to them. + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/sectors` | viewer | List sectors for a site with device counts | +| `POST` | `/api/tenants/{tenant_id}/sites/{site_id}/sectors` | operator | Create a sector | +| `PUT` | `/api/tenants/{tenant_id}/sites/{site_id}/sectors/{sector_id}` | operator | Update a sector | +| `DELETE` | `/api/tenants/{tenant_id}/sites/{site_id}/sectors/{sector_id}` | admin | Delete a sector | +| `PUT` | `/api/tenants/{tenant_id}/devices/{device_id}/sector` | operator | Set or clear a device's sector assignment | + +## Wireless Links + +Read-only endpoints for wireless link topology, RF stats, and registrations. + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `GET` | `/api/tenants/{tenant_id}/links` | viewer | List all wireless links (optional `state` and `device_id` query filters) | +| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/links` | viewer | List links where the device is AP or CPE | +| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/links` | viewer | List links where either side belongs to the site | +| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/registrations` | viewer | Latest wireless registration data per MAC | +| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/rf-stats` | viewer | Latest RF monitor stats per interface | +| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/unknown-clients` | viewer | Wireless clients whose MAC doesn't match any known device | + +## Signal History + +Time-bucketed signal strength trending for wireless clients. + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `GET` | `/api/tenants/{tenant_id}/devices/{device_id}/signal-history` | viewer | Get signal history for a client MAC | + +Query parameters: + +- `mac_address` (required) -- client MAC address +- `range` -- time range: `24h`, `7d`, or `30d` (default `7d`) + +## Site Alerts + +Site-scoped alert rules and alert events. + +### Alert Rules + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules` | viewer | List alert rules (optional `sector_id` filter) | +| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules/{rule_id}` | viewer | Get a single alert rule | +| `POST` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules` | operator | Create an alert rule | +| `PUT` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules/{rule_id}` | operator | Update an alert rule | +| `DELETE` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-rules/{rule_id}` | operator | Delete an alert rule | + +### Alert Events + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `GET` | `/api/tenants/{tenant_id}/sites/{site_id}/alert-events` | viewer | List alert events (optional `state` filter, `limit` up to 200) | +| `POST` | `/api/tenants/{tenant_id}/alert-events/{event_id}/resolve` | operator | Resolve an active alert event | +| `GET` | `/api/tenants/{tenant_id}/alert-events/count` | viewer | Active alert event count (notification badge) | + +## Config Backups + +Device config backup timeline, restore, and schedule management. All routes are scoped under `/api/tenants/{tenant_id}/devices/{device_id}/config/`. + +### Backup Timeline + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `GET` | `.../config/backups` | viewer | List backup timeline for a device (newest first) | +| `POST` | `.../config/backups` | operator | Trigger a manual config backup | +| `POST` | `.../config/checkpoint` | operator | Create a checkpoint (named restore point) | +| `GET` | `.../config/backups/{commit_sha}/export` | viewer | Download export.rsc text for a backup version | +| `GET` | `.../config/backups/{commit_sha}/binary` | viewer | Download backup.bin for a backup version | + +### Restore + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `POST` | `.../config/preview-restore` | operator | Preview impact analysis before restoring a config version | +| `POST` | `.../config/restore` | operator | Restore a config version (two-phase push with panic-revert) | +| `POST` | `.../config/emergency-rollback` | operator | Rollback to most recent pre-push backup | + +### Schedules + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `GET` | `.../config/schedules` | viewer | Get effective backup schedule (device override or tenant default) | +| `PUT` | `.../config/schedules` | operator | Create or update device-specific schedule override | + +### Config Snapshot + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `POST` | `.../config-snapshot/trigger` | operator | Trigger immediate config snapshot via the Go poller (NATS) | + +## Remote Access + +SSH terminal and WinBox tunnel sessions. All routes are scoped under `/api/tenants/{tenant_id}/devices/{device_id}/`. Requires operator role or above. + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `POST` | `.../winbox-session` | operator | Open a WinBox tunnel (returns tunnel_id, host, port, winbox:// URI) | +| `DELETE` | `.../winbox-session/{tunnel_id}` | operator | Close a WinBox tunnel (idempotent) | +| `POST` | `.../ssh-session` | operator | Create a single-use SSH WebSocket session token (120s TTL) | +| `GET` | `.../sessions` | operator | List active WinBox tunnels and remote sessions for a device | + +The SSH session token authorises a subsequent WebSocket connection at `/ws/ssh?token=`. + +## WinBox Remote (Browser) + +Xpra-based in-browser WinBox sessions. All routes are scoped under `/api/tenants/{tenant_id}/devices/{device_id}/winbox-remote-sessions/`. Requires operator role or above. + +| Method | Endpoint | RBAC | Description | +|--------|----------|------|-------------| +| `POST` | `.../winbox-remote-sessions` | operator | Create a browser WinBox session | +| `GET` | `.../winbox-remote-sessions` | operator | List active sessions for a device | +| `GET` | `.../winbox-remote-sessions/{session_id}` | operator | Get session status | +| `DELETE` | `.../winbox-remote-sessions/{session_id}` | operator | Terminate a session (idempotent) | +| `GET` | `.../winbox-remote-sessions/{session_id}/xpra/{path}` | operator | Proxy Xpra HTML5 client files | +| `WS` | `.../winbox-remote-sessions/{session_id}/ws` | operator | WebSocket proxy (browser to Xpra worker) | + +Session creation returns a `websocket_path` for the Xpra WebSocket connection. Sessions enforce idle timeout (default 600s) and max lifetime (default 7200s). + ## Multi-Tenancy Tenant isolation is enforced at the database level via PostgreSQL Row-Level Security (RLS). The `app_user` database role automatically filters all queries by the authenticated user's `tenant_id`. Super admins operate outside tenant scope. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index adde00c..b9138af 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -44,10 +44,24 @@ TOD (The Other Dude) is a containerized MSP fleet management platform for MikroT - `admin_engine` (superuser) -- used only for auth/bootstrap and NATS subscribers that need cross-tenant access - `app_engine` (non-superuser `app_user` role) -- used for all device/data routes, enforces RLS - **Authentication**: JWT tokens (15min access, 7d refresh), SRP-6a zero-knowledge proof, RBAC (super_admin, admin, operator, viewer) -- **NATS subscribers**: Three independent subscribers for device status, metrics, and firmware events. Non-fatal startup -- API serves requests even if NATS is unavailable -- **Background services**: APScheduler for nightly config backups and daily firmware version checks +- **NATS subscribers**: Ten independent subscribers, each on its own NATS connection. Non-fatal startup -- API serves requests even if NATS is unavailable: + - `nats_subscriber` -- device status events + - `metrics_subscriber` -- device metrics (CPU, memory, interface counters) + - `firmware_subscriber` -- firmware version events + - `session_audit_subscriber` -- SSH session auditing + - `config_change_subscriber` -- event-driven config backups + - `push_rollback_subscriber` -- config push rollback and alerting + - `config_snapshot_subscriber` -- config snapshot ingestion (Go poller -> PostgreSQL via Transit encryption) + - `wireless_registration_subscriber` -- per-client wireless registration data + - `interface_subscriber` -- device interface MAC resolution for link discovery + - `link_discovery_subscriber` -- wireless link state machine (MAC-based AP/CPE pairing) +- **Background services**: + - APScheduler: nightly config backups, daily firmware version checks, retention cleanup (24h cycle) + - WinBox session reconciliation loop (60s cycle) -- detects orphaned sessions and cleans up Redis + tunnels + - Signal trend detection loop (hourly) -- identifies sustained signal degradation across wireless clients + - Site alert evaluation loop (5-minute cycle) -- evaluates geographic-scoped alert rules with hysteresis - **OpenBao integration**: Provisions per-tenant Transit encryption keys on startup, dual-read fallback if OpenBao is unavailable -- **Startup sequence**: Configure logging -> Run Alembic migrations -> Bootstrap first admin -> Start NATS subscribers -> Ensure SSE streams -> Start schedulers -> Provision OpenBao keys +- **Startup sequence**: Configure logging -> Run Alembic migrations -> Bootstrap first admin -> Start NATS subscribers (10) -> Ensure SSE streams -> Start schedulers -> Provision OpenBao keys -> Recover stale push operations -> Start background loops (reconciliation, trend detection, site alerts) - **API documentation**: OpenAPI docs at `/docs` and `/redoc` (dev environment only) - **Health endpoints**: `/health` (liveness), `/health/ready` (readiness -- checks PostgreSQL, Redis, NATS) - **Middleware stack** (LIFO order): RequestID -> SecurityHeaders -> RateLimiting -> CORS -> Route handler @@ -55,7 +69,7 @@ TOD (The Other Dude) is a containerized MSP fleet management platform for MikroT #### API Routers -The backend exposes 25 route groups under the `/api` prefix: +The backend exposes 33 route groups under the `/api` prefix: | Router | Purpose | |--------|---------| @@ -84,6 +98,14 @@ The backend exposes 25 route groups under the `/api` prefix: | `certificates` | Internal CA and device TLS certificates | | `settings` | System settings (SMTP configuration, super_admin only) | | `transparency` | KMS access event dashboard | +| `remote_access` | SSH remote access sessions | +| `winbox_remote` | WinBox browser-based remote sessions | +| `sites` | Site management (hierarchical device organization) | +| `sectors` | Sector definitions within sites (antenna/coverage zones) | +| `links` | Wireless link discovery and state tracking | +| `signal_history` | Per-client signal strength history and trends | +| `site_alerts` | Geographic-scoped alert rules and events | +| `config` | Config push operations (two-phase with panic revert) | ### Go Poller @@ -135,7 +157,7 @@ The backend exposes 25 route groups under the `/api` prefix: - **Durable consumers**: Ensure no message loss during API restarts - **Monitoring port**: 8222 - **Data volume**: `./docker-data/nats` -- **Memory limit**: 128MB +- **Memory limit**: 256MB ### OpenBao (HashiCorp Vault fork) @@ -245,6 +267,48 @@ Browser API PostgreSQL - `poller_user` bypasses RLS intentionally (needs cross-tenant device access for polling) - Tenant isolation is enforced at the database level, not the application level -- even a compromised API cannot leak cross-tenant data through `app_user` connections +## Sites & Sectors + +The site management subsystem provides hierarchical device organization for tower-based wireless deployments. + +- **Sites**: Named geographic locations (towers, POPs, huts) with optional latitude/longitude coordinates +- **Sectors**: Coverage zones within a site, representing individual antenna faces or radio segments. Each sector belongs to exactly one site and can have one or more devices assigned +- **Device assignment**: Devices are assigned to sectors, inheriting site membership. A device belongs to at most one sector at a time +- **Site health**: Aggregate health status is derived from the devices within a site's sectors -- if any device is down, the site status reflects it + +## Wireless Link Discovery + +MAC-based automatic detection of AP-to-CPE wireless links. + +- **Interface subscriber**: Ingests device interface data from NATS, building a MAC-to-device lookup table +- **Wireless registration subscriber**: Processes per-client wireless registration events, capturing connected MACs and signal data +- **Link discovery subscriber**: Correlates AP registration tables with CPE interface MACs to identify links between managed devices +- **State machine**: Each discovered link transitions through states based on signal quality and reachability: + - `discovered` -- initial detection, not yet confirmed + - `active` -- confirmed bidirectional link with acceptable signal + - `degraded` -- signal below threshold or intermittent connectivity + - `down` -- link lost (device unreachable or deregistered) + - `stale` -- no update received within the retention window +- **Automatic pairing**: When an AP's registration table contains a MAC belonging to a managed CPE, a link record is created without manual configuration + +## Signal History & Trend Detection + +Per-client signal strength tracking with automatic degradation alerting. + +- **Signal history**: Records signal strength samples for each wireless client over time, stored in TimescaleDB for efficient time-range queries +- **Trend detection loop** (hourly): Analyzes recent signal history to identify sustained degradation. When a client's signal drops below threshold for a configurable window, the system creates a site alert event with rule type `signal_degradation`. Auto-resolves when signal recovers +- **Retention**: Signal history samples are subject to the same retention cleanup as other time-series data + +## Site Alert Rules + +Geographic-scoped alerting distinct from per-device alerts. + +- **Rule types**: Configurable rules scoped to a site (e.g., "alert when more than N devices are down at site X", signal degradation thresholds) +- **Evaluation loop** (5-minute cycle): Evaluates all enabled site alert rules against current data +- **Hysteresis**: Rules require consecutive hits (default 2) before confirming an alert, preventing flapping from transient conditions +- **Event lifecycle**: Alert events are created when rules trigger and auto-resolved when conditions clear. Manual resolution is also supported +- **Separation from device alerts**: Site alerts operate independently from the per-device alert system, allowing operators to set geographic thresholds without duplicating device-level rules + ## Security Layers | Layer | Mechanism | Purpose | @@ -285,7 +349,7 @@ backend/ FastAPI Python backend config.py Pydantic Settings configuration database.py SQLAlchemy engines (admin + app_user) models/ SQLAlchemy ORM models - routers/ FastAPI route handlers (25 modules) + routers/ FastAPI route handlers (33 modules) services/ Business logic, NATS subscribers, schedulers middleware/ Rate limiting, request ID, security headers frontend/ React TypeScript frontend @@ -332,6 +396,6 @@ docker compose build frontend | Go Poller | 512MB | | OpenBao | 256MB | | Redis | 128MB | -| NATS | 128MB | +| NATS | 256MB | | WireGuard | 128MB | | Frontend (nginx) | 64MB | diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index 2d7bd55..84f5496 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -29,11 +29,12 @@ TOD uses Pydantic Settings for configuration. All values can be set via environm | Variable | Default | Description | |----------|---------|-------------| -| `DATABASE_URL` | `postgresql+asyncpg://postgres:postgres@localhost:5432/mikrotik` | Admin (superuser) async database URL. Used for migrations and bootstrap operations. | -| `SYNC_DATABASE_URL` | `postgresql+psycopg2://postgres:postgres@localhost:5432/mikrotik` | Synchronous database URL used by Alembic migrations only. | -| `APP_USER_DATABASE_URL` | `postgresql+asyncpg://app_user:app_password@localhost:5432/mikrotik` | Non-superuser async database URL. Enforces PostgreSQL RLS for tenant isolation. | +| `DATABASE_URL` | `postgresql+asyncpg://postgres:postgres@localhost:5432/tod` | Admin (superuser) async database URL. Used for migrations and bootstrap operations. | +| `SYNC_DATABASE_URL` | `postgresql+psycopg2://postgres:postgres@localhost:5432/tod` | Synchronous database URL used by Alembic migrations only. | +| `APP_USER_DATABASE_URL` | `postgresql+asyncpg://app_user:app_password@localhost:5432/tod` | Non-superuser async database URL. Enforces PostgreSQL RLS for tenant isolation. | | `DB_POOL_SIZE` | `20` | App user connection pool size | | `DB_MAX_OVERFLOW` | `40` | App user pool max overflow connections | +| `DB_POOL_RECYCLE` | `1847` | Connection pool recycle time in seconds | | `DB_ADMIN_POOL_SIZE` | `10` | Admin connection pool size | | `DB_ADMIN_MAX_OVERFLOW` | `20` | Admin pool max overflow connections | @@ -82,6 +83,20 @@ OpenBao is the key management service used to encrypt device credentials on a pe | `FIRMWARE_CACHE_DIR` | `/data/firmware-cache` | Path to firmware download cache (PVC mount in production) | | `FIRMWARE_CHECK_INTERVAL_HOURS` | `24` | Hours between automatic RouterOS version checks | +### Signal Trending & Site Alerting + +| Variable | Default | Description | +|----------|---------|-------------| +| `SIGNAL_DEGRADATION_THRESHOLD_DB` | `5` | Signal degradation threshold in dB for trend detection | +| `ALERT_EVALUATION_INTERVAL_SECONDS` | `300` | How often site alert rules are evaluated | +| `TREND_DETECTION_INTERVAL_SECONDS` | `3600` | How often signal trending analysis runs | + +### Retention + +| Variable | Default | Description | +|----------|---------|-------------| +| `CONFIG_RETENTION_DAYS` | `90` | How long config snapshots are retained | + ### Storage Paths | Variable | Default | Description | @@ -141,7 +156,7 @@ All containers have enforced memory limits to prevent OOM on the host: |---------|-------------| | PostgreSQL | 512 MB | | Redis | 128 MB | -| NATS | 128 MB | +| NATS | 256 MB | | API | 512 MB | | Poller | 256 MB | | Frontend | 64 MB | diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index cf16b0a..e1241c6 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -12,6 +12,9 @@ TOD (The Other Dude) is a containerized fleet management platform for RouterOS d - **PostgreSQL + TimescaleDB** -- Primary database with time-series extensions - **Redis** -- Distributed locking and rate limiting - **NATS JetStream** -- Message bus for device events +- **OpenBao** -- Secrets management (Transit encryption for credentials, config backups, audit logs) +- **WireGuard** -- VPN gateway for isolated device networks +- **WinBox Worker** -- Xpra-based container for browser WinBox sessions (runs on linux/amd64, 1GB memory limit) ## Prerequisites @@ -159,6 +162,9 @@ Container memory limits are enforced in `docker-compose.prod.yml` to prevent OOM | API | 512MB | | Poller | 512MB | | Frontend | 64MB | +| OpenBao | 256MB | +| WireGuard | 128MB | +| WinBox Worker | 1GB | Adjust under `deploy.resources.limits.memory` in `docker-compose.prod.yml`. @@ -238,6 +244,7 @@ The Helm chart deploys: | Frontend | Deployment | React SPA (nginx) | | Poller | Deployment | Go device poller | | WireGuard | Deployment | VPN gateway | +| WinBox Worker | Deployment | Browser-based WinBox sessions (Xpra) | ### Configuration diff --git a/docs/README.md b/docs/README.md index 05e6ee7..02f1e40 100644 --- a/docs/README.md +++ b/docs/README.md @@ -25,7 +25,11 @@ The Other Dude is a self-hosted, multi-tenant platform (one installation serves - **Dashboard** -- At-a-glance fleet health with device counts, uptime sparklines, status breakdowns per organization, and an "APs Needing Attention" card highlighting wireless issues. - **Device Management** -- Detailed device pages with system info, interfaces, routes, firewall rules, DHCP leases, and real-time resource metrics. - **Fleet Table** -- Virtual-scrolled table that handles hundreds of devices without breaking a sweat. -- **Device Map** -- Geographic view of device locations. +- **Tower & Site Management** -- Organize devices by physical location. Sites represent towers or equipment rooms; sectors subdivide them by antenna direction with azimuth bearings. Health grid shows per-device CPU, memory, and uptime at a glance. +- **Wireless Link Discovery** -- Automatic AP-to-CPE link detection with real-time signal strength, CCQ, TX/RX rates, and a five-state health model (discovered, active, degraded, down, stale). +- **Signal History & Trend Detection** -- Per-client signal history charts with min/avg/max trends over 24-hour, 7-day, and 30-day windows. Color-banded thresholds highlight degradation at a glance. +- **Site-Level Alert Rules** -- Threshold-based alerts scoped to sites and sectors: device offline percentage, sector signal average, client drop detection, and signal degradation. +- **Fleet Map** -- Geographic map with status-colored markers and automatic clustering. Cluster colors reflect aggregate device health across a region. - **Subnet Scanner** -- Discover new RouterOS devices on your network and onboard them in clicks. ### Configuration diff --git a/docs/SECURITY.md b/docs/SECURITY.md index ef3a5b8..30b33fa 100644 --- a/docs/SECURITY.md +++ b/docs/SECURITY.md @@ -96,6 +96,7 @@ TOD includes on-demand WinBox tunnels and browser-based SSH terminals for device - **Audit trail:** Tunnel open/close events and SSH session start/end events are recorded in the immutable audit log with device ID, user ID, source IP, and timestamp. - **WinBox tunnel binding:** TCP proxies for WinBox connections are bound to `127.0.0.1` only. Tunnels are never exposed on `0.0.0.0` and cannot be reached from outside the host without explicit port forwarding. - **Idle-timeout cleanup:** Inactive tunnels are closed automatically after `TUNNEL_IDLE_TIMEOUT` seconds (default 300). SSH sessions time out after `SSH_IDLE_TIMEOUT` seconds (default 900). Resources are reclaimed immediately on disconnect. +- **WinBox Browser sessions:** WinBox sessions use single-use session IDs stored in Redis with a short TTL. The browser connects via a WebSocket proxy -- never directly to the device. Sessions follow a strict lifecycle (`creating` -> `active` -> `grace` -> `terminated`) with automatic cleanup at each stage. Device credentials are decrypted server-side via the OpenBao Transit engine and are never sent to the browser. Session creation is rate-limited to 3 requests per 5 minutes per user. ## Network Security diff --git a/docs/USER-GUIDE.md b/docs/USER-GUIDE.md index 432bf2f..19a43c1 100644 --- a/docs/USER-GUIDE.md +++ b/docs/USER-GUIDE.md @@ -36,7 +36,9 @@ TOD uses a collapsible sidebar with four sections. Press `[` to toggle the sideb |------|-------------| | **Dashboard** | Overview of your fleet with device status cards, active alerts, metrics sparklines, and "APs Needing Attention" wireless health card. The landing page after login. | | **Devices** | Fleet table with search, sort, and filter. Click any device row to open its detail page. | -| **Map** | Geographic map view of device locations. | +| **Sites** | Tower and site management -- organize devices by physical location with sectors, health monitoring, wireless links, and site-scoped alerts. | +| **Wireless Links** | Fleet-wide view of all discovered AP-to-CPE wireless connections with signal, CCQ, TX/RX rates, and link state. | +| **Map** | Geographic fleet map with status-colored markers and automatic clustering. Devices with coordinates appear on the map; clusters reflect aggregate health (green = all online, red = all offline, amber = mixed). | ### Manage @@ -236,6 +238,138 @@ TOD supports dark and light modes: --- +## Tower & Site Management + +Sites represent physical locations in your network -- towers, rooftops, equipment rooms, or any place where you deploy devices. Sectors let you subdivide a site by antenna direction. Together they give you a structured view of your wireless infrastructure. + +### Creating a Site + +1. Navigate to **Fleet > Sites** in the sidebar. +2. Click **New Site**. +3. Fill in the site details: + - **Name** (required) -- a descriptive label for the location (e.g., "North Ridge Tower"). + - **Address** -- street address or landmark description. + - **Latitude / Longitude** -- GPS coordinates. Devices at this site inherit these coordinates on the fleet map. + - **Elevation** -- tower or rooftop height in meters. + - **Notes** -- free-text field for internal reference. +4. Click **Create Site**. + +The Sites list shows all sites with search filtering. Click any site to open its detail page. + +### Site Detail Page + +The site detail page shows a summary header with device count, online count, online percentage, and active alert count. Four tabs provide deeper views: + +| Tab | Description | +|-----|-------------| +| **Health Grid** | Card grid of every device assigned to the site showing live CPU, memory, and uptime. Cards are color-coded by status (green = online, red = offline). Click any card to open the device detail page. | +| **Sectors** | Sector-based view of devices and their connected CPE clients. Shows per-sector aggregate stats (client count, average signal, link count). | +| **Links** | Table of all wireless links at the site, grouped by AP, with signal strength, CCQ, TX/RX rates, link state, and expandable signal history charts. | +| **Alerts** | Site-scoped alert rules and alert event history. Create and manage rules that apply to this specific site or sector. | + +### Creating Sectors + +Sectors organize access points within a site by antenna direction (e.g., "North 0-120" or "South Sector"). To create a sector: + +1. Open a site detail page and switch to the **Sectors** tab. +2. Click **Add Sector**. +3. Enter: + - **Name** (required) -- a label for the sector direction (e.g., "North Sector"). + - **Azimuth** -- compass bearing in degrees (0-360) representing the antenna direction. 0 is north, 90 is east, 180 is south, 270 is west. + - **Description** -- optional notes about the sector. +4. Click **Create Sector**. + +Each sector section is collapsible and shows a header with device count, connected client count, average signal strength, and link count. Devices within a sector are listed with their connected CPEs and link states inline. + +### Assigning Devices to Sites and Sectors + +Devices are assigned to a site from the device detail page or from the Sites section. Once assigned, you can further assign a device to a specific sector: + +1. Open the site detail page and switch to the **Sectors** tab. +2. Each device row has a sector assignment dropdown on the right. +3. Select a sector from the dropdown to assign the device, or select **Unassigned** to remove the sector assignment. + +Devices that belong to a site but have no sector assignment appear in the **Unassigned** section at the bottom of the Sectors tab. + +--- + +## Wireless Links + +TOD automatically discovers wireless connections between access points (APs) and client premise equipment (CPEs) in your fleet. When the poller detects a registration table entry on an AP that matches a CPE device in your fleet, it creates a wireless link record. + +### Link States + +Each wireless link has a state that reflects its current health: + +| State | Meaning | +|-------|---------| +| **Discovered** | A new AP-CPE connection has been detected for the first time. | +| **Active** | The link is up with recent poll data confirming connectivity. | +| **Degraded** | The link is connected but signal or quality metrics have dropped below healthy thresholds. | +| **Down** | The link has not been seen in recent polls -- the CPE is likely disconnected. | +| **Stale** | The link has not been seen for an extended period. The connection may no longer exist. | + +Link states transition automatically based on poll results and missed-poll counters. + +### Viewing Wireless Links + +There are two ways to view wireless links: + +- **Fleet-wide**: Navigate to **Fleet > Wireless Links** in the sidebar. This shows all discovered links across your organization, filterable by state (active, degraded, down, stale). +- **Per-site**: Open a site detail page and switch to the **Links** tab. This shows only the links associated with devices assigned to that site. + +Both views group links by AP device. Each CPE row shows signal strength (dBm), CCQ percentage, TX/RX data rates, link state, and time since last seen. + +### Signal History + +Click any CPE row in the wireless links table to expand an inline signal history chart. The chart shows signal strength over time with three lines: + +- **Average signal** (solid blue) -- the primary trend line. +- **Min / Max signal** (dashed) -- the range boundaries. + +The background is color-banded: green for strong signal (above -65 dBm), yellow for moderate (-65 to -80 dBm), and red for weak (below -80 dBm). + +Use the time range selector in the chart header to switch between **24h**, **7d**, and **30d** views. This helps you spot intermittent degradation, seasonal patterns, or gradual signal drift that might not be obvious from a single snapshot. + +--- + +## Site Alerts + +Site alert rules let you define thresholds scoped to an entire site or a specific sector, rather than individual devices. This is useful for detecting systemic issues across a tower location. + +### Creating a Site Alert Rule + +1. Open the site detail page and switch to the **Alerts** tab. +2. Click **Add Alert Rule**. +3. Configure the rule: + - **Rule type** -- choose from: + - *Device Offline Percent* -- fires when the percentage of offline devices at the site exceeds the threshold. + - *Device Offline Count* -- fires when a specific number of devices go offline. + - *Sector Signal Average* -- fires when the average signal strength across a sector drops below the threshold. + - *Sector Client Drop* -- fires when the number of connected clients in a sector drops by more than the threshold. + - *Signal Degradation* -- fires when individual link signal degrades past a threshold. + - **Scope** -- apply the rule to the entire site or narrow it to a specific sector. + - **Threshold** -- the numeric value and unit that triggers the alert. + - **Severity** -- warning or critical. +4. Click **Create Rule**. + +Alert events appear in the site's Alerts tab with timestamps, severity, the triggering message, and consecutive hit count. Active alerts can be resolved manually by operators. + +--- + +## Fleet Map + +The fleet map provides a geographic view of all devices that have coordinates assigned (either directly on the device or inherited from their site). + +- Navigate to **Fleet > Map** in the sidebar. +- Devices appear as color-coded markers: **green** for online, **red** for offline. +- When devices are geographically close, they automatically cluster into numbered circles. Cluster color reflects aggregate health: green if all devices in the cluster are online, red if all are offline, and amber if mixed. +- Click a cluster to zoom in and see individual markers. Click a device marker to see its status summary and link to its detail page. +- Super admins can filter the map by organization using the dropdown in the toolbar. +- The map auto-fits to show all mapped devices when loaded. The toolbar shows how many of your devices have coordinates assigned. + +--- + ## Tips - Use the **command palette** (`Cmd+K`) for the fastest way to navigate. It searches pages, devices, and actions.