From a4e1c787447f43e2d2abf592d7a2c7ea0b31ecdf Mon Sep 17 00:00:00 2001 From: Jason Staack Date: Thu, 12 Mar 2026 15:47:03 -0500 Subject: [PATCH] docs: update documentation for v9.5 remote access feature Add tunnel manager, SSH relay, new env vars, security model, and Remote Access key feature entry across ARCHITECTURE, DEPLOYMENT, SECURITY, CONFIGURATION, and README. Co-Authored-By: Claude Sonnet 4.6 --- README.md | 1 + docs/ARCHITECTURE.md | 11 +++++++++-- docs/CONFIGURATION.md | 13 +++++++++++++ docs/DEPLOYMENT.md | 12 +++++++++++- docs/SECURITY.md | 10 ++++++++++ 5 files changed, 44 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 055269a..625dcbb 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,7 @@ real-time monitoring, and zero-knowledge security, all self-hosted on your infra - **Zero-Knowledge Security** -- 1Password-style architecture. SRP-6a authentication (server never sees your password), per-tenant envelope encryption via Transit KMS, Emergency Kit export. - **Multi-Tenant with PostgreSQL RLS** -- Full organization isolation enforced at the database layer. Four roles: super_admin, admin, operator, viewer. - **Internal Certificate Authority** -- Issue and deploy TLS certificates to RouterOS devices via SFTP. Three-tier TLS fallback for maximum compatibility. +- **Remote Access** -- WinBox TCP tunnels and browser-based SSH terminal for managing devices behind NAT. One-click connection through the WireGuard VPN overlay. - **WireGuard VPN Onboarding** -- Create device + VPN peer in one transaction. Generates ready-to-paste RouterOS commands for devices behind NAT. - **PDF Reports** -- Fleet summary, device detail, security audit, and performance reports generated server-side. - **Command Palette UX** -- Cmd+K quick navigation, keyboard shortcuts, dark/light mode, smooth page transitions, and skeleton loaders throughout. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 5423794..b4fcf5c 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -97,7 +97,9 @@ The backend exposes 25 route groups under the `/api` prefix: - **Output**: Publishes poll results to NATS JetStream; the API's NATS subscribers process and persist them - **Database access**: Uses `poller_user` role which bypasses RLS (needs cross-tenant device access) - **VPN routing**: Adds static route to WireGuard gateway for reaching remote devices -- **Memory limit**: 256MB +- **Tunnel manager**: On-demand TCP proxy for WinBox connections; allocates ports from a configurable range (default 49000–49100), bound to localhost only, with idle-timeout cleanup +- **SSH relay**: WebSocket-to-SSH bridge serving browser-based terminal sessions; listens on port 8080, enforces per-user and per-device session limits +- **Memory limit**: 512MB ## Infrastructure Services @@ -271,6 +273,8 @@ All services communicate over a single Docker bridge network (`tod`). External p | NATS Monitor | 8222 | 8222 | HTTP | | OpenBao | 8200 | 8200 | HTTP | | WireGuard | 51820 | 51820 | UDP | +| Poller SSH Relay | 8080 | 8080 | HTTP/WebSocket | +| Poller WinBox Tunnels | 49000–49100 | 49000–49100 | TCP (localhost only) | ## File Structure @@ -292,6 +296,9 @@ frontend/ React TypeScript frontend poller/ Go microservice for device polling main.go Entry point Dockerfile Multi-stage build + internal/ + tunnel/ WinBox TCP proxy and port pool manager + sshrelay/ WebSocket-to-SSH bridge for browser terminals infrastructure/ Deployment configuration docker/ Dockerfiles for api, frontend helm/ Kubernetes Helm charts @@ -322,7 +329,7 @@ docker compose build frontend |---------|-------| | PostgreSQL | 512MB | | API | 512MB | -| Go Poller | 256MB | +| Go Poller | 512MB | | OpenBao | 256MB | | Redis | 128MB | | NATS | 128MB | diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index 19c59a6..a5a37cb 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -87,6 +87,19 @@ TOD uses Pydantic Settings for configuration. All values can be set via environm | `GIT_STORE_PATH` | `./git-store` | Path to bare git repos for config backup history (one repo per tenant). In production: `/data/git-store` on a ReadWriteMany PVC. | | `WIREGUARD_CONFIG_PATH` | `/data/wireguard` | Shared volume path for WireGuard configuration files | +### Remote Access (Go Poller) + +| Variable | Default | Description | +|----------|---------|-------------| +| `TUNNEL_PORT_MIN` | `49000` | Start of WinBox tunnel port range | +| `TUNNEL_PORT_MAX` | `49100` | End of WinBox tunnel port range | +| `TUNNEL_IDLE_TIMEOUT` | `300` | WinBox tunnel idle timeout (seconds) | +| `SSH_RELAY_PORT` | `8080` | SSH relay HTTP server port | +| `SSH_IDLE_TIMEOUT` | `900` | SSH session idle timeout (seconds) | +| `SSH_MAX_SESSIONS` | `200` | Maximum concurrent SSH sessions | +| `SSH_MAX_PER_USER` | `10` | Maximum SSH sessions per user | +| `SSH_MAX_PER_DEVICE` | `20` | Maximum SSH sessions per device | + ### Bootstrap | Variable | Default | Description | diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index 3a9668b..8c1839e 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -119,6 +119,14 @@ Log in with the `FIRST_ADMIN_EMAIL` and `FIRST_ADMIN_PASSWORD` credentials set i | `CIRCUIT_BREAKER_MAX_BACKOFF_SECONDS` | `900` | Maximum backoff (15 min) | | `LOG_LEVEL` | `info` | Logging verbosity (`debug`/`info`/`warn`/`error`) | | `CORS_ORIGINS` | `http://localhost:3000` | Comma-separated CORS origins | +| `TUNNEL_PORT_MIN` | `49000` | Start of WinBox tunnel port range | +| `TUNNEL_PORT_MAX` | `49100` | End of WinBox tunnel port range | +| `TUNNEL_IDLE_TIMEOUT` | `300` | WinBox tunnel idle timeout (seconds) | +| `SSH_RELAY_PORT` | `8080` | SSH relay HTTP server port | +| `SSH_IDLE_TIMEOUT` | `900` | SSH session idle timeout (seconds) | +| `SSH_MAX_SESSIONS` | `200` | Maximum concurrent SSH sessions | +| `SSH_MAX_PER_USER` | `10` | Maximum SSH sessions per user | +| `SSH_MAX_PER_DEVICE` | `20` | Maximum SSH sessions per device | ### Security Notes @@ -149,11 +157,13 @@ Container memory limits are enforced in `docker-compose.prod.yml` to prevent OOM | Redis | 128MB | | NATS | 128MB | | API | 512MB | -| Poller | 256MB | +| Poller | 512MB | | Frontend | 64MB | Adjust under `deploy.resources.limits.memory` in `docker-compose.prod.yml`. +> **Note:** The WinBox tunnel port range (`TUNNEL_PORT_MIN`–`TUNNEL_PORT_MAX`, default 49000–49100) must be mapped in the poller container's port bindings. Add `"49000-49100:49000-49100"` to the poller service's `ports` list in your compose file. The SSH relay port (`SSH_RELAY_PORT`, default 8080) similarly requires a port mapping if accessed directly. + ## API Documentation The backend serves interactive API documentation at: diff --git a/docs/SECURITY.md b/docs/SECURITY.md index ea01f30..721ca9f 100644 --- a/docs/SECURITY.md +++ b/docs/SECURITY.md @@ -85,6 +85,16 @@ TOD includes a per-tenant Internal Certificate Authority for managing TLS certif - **Key protection:** CA private keys are encrypted with AES-256-GCM before database storage. PEM key material is never logged or exposed via API responses. - **Certificate rotation and revocation:** Supported via the certificate lifecycle state machine. +## Remote Access Security + +TOD v9.5 adds on-demand WinBox tunnels and browser-based SSH terminals for devices behind NAT. + +- **Single-use session tokens:** SSH sessions are initiated with a short-lived token stored in Redis (`GETDEL`, 120-second TTL). The token is consumed on first use and cannot be replayed. +- **RBAC enforcement:** Opening a tunnel or starting an SSH session requires the `operator` role or higher. `viewer` accounts have no access to remote access features. +- **Audit trail:** Tunnel open/close events and SSH session start/end events are recorded in the immutable audit log with device ID, user ID, source IP, and timestamp. +- **WinBox tunnel binding:** TCP proxies for WinBox connections are bound to `127.0.0.1` only. Tunnels are never exposed on `0.0.0.0` and cannot be reached from outside the host without explicit port forwarding. +- **Idle-timeout cleanup:** Inactive tunnels are closed automatically after `TUNNEL_IDLE_TIMEOUT` seconds (default 300). SSH sessions time out after `SSH_IDLE_TIMEOUT` seconds (default 900). Resources are reclaimed immediately on disconnect. + ## Network Security - **RouterOS communication:** All device communication uses the RouterOS binary API over TLS (port 8729). InsecureSkipVerify is enabled by default because RouterOS devices typically use self-signed certificates.