Comprehensive design for v9.5 remote access feature: - WinBox TCP tunnel through poller with localhost port allocation - Browser SSH terminal via xterm.js + WebSocket to poller SSH relay - RBAC enforcement (operator+), audit logging, session tokens - Infrastructure: nginx WebSocket proxy, Docker port range mapping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
31 KiB
Remote Access Design — WinBox Tunnels + SSH Terminal (v9.5)
Overview
Add remote WinBox and SSH terminal access to TOD. Users connect to RouterOS devices behind NAT through the TOD controller without direct network access to the router.
- WinBox: TCP tunnel through the poller container. User's native WinBox app connects to
127.0.0.1:<port>. - SSH Terminal: Browser-based xterm.js terminal. WebSocket to poller, which bridges to SSH PTY on the router.
Device Type Scope
- WinBox tunnels: RouterOS devices only (WinBox is MikroTik-specific, port 8291)
- SSH terminal: All device types that support SSH (RouterOS and future
linuxrtrdevices) - The frontend should show/hide the "Open WinBox" button based on device type. The "SSH Terminal" button renders for all SSH-capable device types.
System Architecture
┌─────────────────────────────────┐
│ User's Machine │
│ │
│ Browser (TOD UI) │
│ ├─ xterm.js SSH terminal │
│ └─ "Open WinBox" button │
│ │
│ WinBox app │
│ └─ connects 127.0.0.1:491xx │
└──────────┬──────────┬───────────┘
│ │
WebSocket TCP (WinBox)
/ws/ssh/ 127.0.0.1:49000-49100
│ │
┌────────────────────────────────────┼──────────┼────────────────┐
│ Docker Network: tod │ │ │
│ │ │ │
│ ┌──────────────┐ │ │ │
│ │ nginx │──────────────────┘ │ │
│ │ port 3000 │ (proxy /ws/ssh → poller) │ │
│ │ │ (proxy /api → api) │ │
│ └──────┬───────┘ │ │
│ │ │ │
│ ┌──────▼───────┐ NATS ┌───────────────▼──────────┐ │
│ │ API │◄───────────►│ Poller │ │
│ │ FastAPI │ │ Go │ │
│ │ │ │ ├─ tunnel manager │ │
│ │ - RBAC │ session │ │ (TCP proxy :49000+) │ │
│ │ - audit log │ tokens │ ├─ SSH relay │ │
│ │ - session │ (Redis) │ │ (WebSocket ↔ PTY) │ │
│ │ tokens │ │ ├─ device poller │ │
│ └──────────────┘ │ └─ cmd responder │ │
│ └───────────────┬───────────┘ │
│ │ │
│ ┌───────────────▼───────────┐ │
│ │ WireGuard │ │
│ │ 10.10.0.1/24 │ │
│ │ port 51820/udp │ │
│ └───────────────┬───────────┘ │
└───────────────────────────────────────────────┼────────────────┘
│
┌─────────────────────┼──────────────┐
│ │ │
RouterOS RouterOS RouterOS
(direct IP) (VPN peer) (VPN peer)
:8291 :22 10.10.0.x 10.10.0.y
:8291 :22 :8291 :22
Key data paths:
- WinBox: Browser click → API (auth+audit) → NATS → Poller allocates port → Docker maps
127.0.0.1:491xx→ Poller TCP proxy → WireGuard → Router:8291 - SSH: Browser click → API (auth+audit+token) → Browser opens WebSocket → nginx → Poller validates token → SSH+PTY → Router:22
- Auth boundary: API handles all RBAC and audit logging. Poller validates single-use session tokens but never does primary auth.
RBAC
Roles allowed for remote access: operator, admin, super_admin.
viewer role receives 403 Forbidden. The API is the enforcement point; frontend hides buttons for viewers but does not rely on that for security.
Every remote access operation produces an audit log entry:
user_id,tenant_id,device_id,session_type,source_ip,timestamp- SSH sessions additionally log
start_timeandend_time
Poller: Tunnel Manager
New package: poller/internal/tunnel/
Data Structures
type TunnelManager struct {
mu sync.Mutex
tunnels map[string]*Tunnel // keyed by tunnel ID (uuid)
portPool *PortPool // tracks available ports 49000-49100
idleTime time.Duration // 5 minutes
deviceStore *store.DeviceStore // DB lookup for device connection details
credCache *vault.CredentialCache
}
type Tunnel struct {
ID string
DeviceID string
TenantID string
UserID string
LocalPort int
RemoteAddr string // router IP:8291
CreatedAt time.Time
LastActive int64 // atomic, unix nanoseconds
listener net.Listener
cancel context.CancelFunc
conns sync.WaitGroup
activeConns int64 // atomic counter
}
LastActive Concurrency
LastActive stored as int64 (unix nanoseconds) using atomic operations:
- Write:
atomic.StoreInt64(&t.LastActive, time.Now().UnixNano()) - Read:
time.Since(time.Unix(0, atomic.LoadInt64(&t.LastActive)))
Port Pool
type PortPool struct {
mu sync.Mutex
ports []bool // true = in use
base int // 49000
}
Allocate()returns next free port or error if exhaustedRelease()marks port as free- Before allocation, attempt bind to verify port is actually free (handles stale Docker mappings after restart)
- All operations protected by mutex
Tunnel Lifecycle
- NATS message arrives on
tunnel.open - Manager looks up device from database via
DeviceStore.GetDevice(deviceID)to obtain encrypted credentials and connection details (same pattern asCmdResponder) - Decrypts device credentials via credential cache
- Allocates port from pool (verify bind succeeds)
- Starts TCP listener on
127.0.0.1:<port>(never0.0.0.0) - Returns allocated port via NATS reply
- For each incoming TCP connection:
t.conns.Add(1), incrementactiveConns- Dial
router_ip:8291through WireGuard (10s timeout) - If dial fails: close client connection, decrement counter, do not update LastActive
- Bidirectional proxy with context cancellation (see below)
- On exit: decrement
activeConns,t.conns.Done()
- Background goroutine checks every 30s:
- If idle > 5 minutes AND
activeConns == 0: close tunnel
- If idle > 5 minutes AND
- Never close a tunnel while WinBox has an active socket
TCP Proxy (per connection)
func (t *Tunnel) handleConn(tunnelCtx context.Context, clientConn net.Conn) {
defer t.conns.Done()
defer atomic.AddInt64(&t.activeConns, -1)
routerConn, err := net.DialTimeout("tcp", t.RemoteAddr, 10*time.Second)
if err != nil {
clientConn.Close()
return
}
ctx, cancel := context.WithCancel(tunnelCtx) // derived from tunnel context for shutdown propagation
defer cancel() // ensure context cleanup on all exit paths
go func() {
io.Copy(routerConn, newActivityReader(clientConn, &t.LastActive))
cancel()
}()
go func() {
io.Copy(clientConn, newActivityReader(routerConn, &t.LastActive))
cancel()
}()
<-ctx.Done()
clientConn.Close()
routerConn.Close()
}
activityReader wraps io.Reader and calls atomic.StoreInt64 on every Read().
Tunnel Shutdown Order
func (t *Tunnel) Close() {
t.listener.Close() // 1. stop accepting new connections
t.cancel() // 2. cancel context
t.conns.Wait() // 3. wait for active connections
// 4. release port (done by manager)
// 5. delete from manager map (done by manager)
}
NATS Subjects
tunnel.open— Request:{device_id, tenant_id, user_id, target_port}→ Reply:{tunnel_id, local_port}tunnel.close— Request:{tunnel_id}→ Reply:{ok}tunnel.status— Request:{tunnel_id}→ Reply:{active, local_port, connected_clients, idle_seconds}tunnel.status.list— Request:{device_id}→ Reply: list of active tunnels
Logging
Structured JSON logs for: tunnel creation, port allocation, client connection, client disconnect, idle timeout, tunnel close. Fields: tunnel_id, device_id, tenant_id, local_port, remote_addr.
Poller: SSH Relay
New package: poller/internal/sshrelay/
Data Structures
type Server struct {
redis *redis.Client
credCache *vault.CredentialCache
deviceStore *store.DeviceStore
sessions map[string]*Session
mu sync.Mutex
idleTime time.Duration // 15 minutes
maxSessions int // 200
maxPerUser int // 10
maxPerDevice int // 20
}
type Session struct {
ID string // uuid
DeviceID string
TenantID string
UserID string
SourceIP string
StartTime time.Time
LastActive int64 // atomic, unix nanoseconds
sshClient *ssh.Client
sshSession *ssh.Session
ptyCols int
ptyRows int
cancel context.CancelFunc
}
HTTP Server
Runs on port 8080 inside the container (configurable via SSH_RELAY_PORT). Not exposed to host — only accessible through nginx on Docker network.
Endpoints:
/ws/ssh?token=<token>— WebSocket upgrade for SSH terminal/healthz— Health check (returns{"status":"ok"})
Connection Flow
- Browser opens
ws://host/ws/ssh?token=<session_token> - nginx proxies to poller
:8080/ws/ssh - Poller validates single-use token via Redis
GETDEL - Token must contain:
device_id,tenant_id,user_id,source_ip,cols,rows,created_at - Verify
tenant_idmatches device's tenant - Check session limits (200 total, 10 per user, 20 per device) — reject with close frame if exceeded
- Upgrade to WebSocket with hardening:
SetReadLimit(1 << 20)(1MB)- Read deadline management
- Ping/pong keepalive
- Origin validation
- Decrypt device credentials via credential cache
- SSH dial to router (port 22, password auth,
InsecureIgnoreHostKey)- Log host key fingerprint on first connect
- If dial fails: close WebSocket with error message, clean up
- Open SSH session, request PTY (
xterm-256color, initial cols/rows from token) - Obtain stdin, stdout, stderr pipes
- Start shell
- Bridge WebSocket ↔ SSH PTY
WebSocket Message Protocol
- Binary frames: Terminal data — forwarded directly to/from SSH PTY
- Text frames: JSON control messages
{"type": "resize", "cols": 120, "rows": 40}
{"type": "ping"}
Resize validation: cols > 0 && cols <= 500 && rows > 0 && rows <= 200. Reject invalid values.
Bridge Function
func bridge(ctx context.Context, cancel context.CancelFunc,
wsConn, sshSession, stdin, stdout, stderr, lastActive *int64) {
// WebSocket → SSH stdin
go func() {
defer cancel()
for {
msgType, data, err := wsConn.Read(ctx)
if err != nil { return }
atomic.StoreInt64(lastActive, time.Now().UnixNano())
if msgType == websocket.TextMessage {
var ctrl ControlMsg
if json.Unmarshal(data, &ctrl) != nil { continue }
if ctrl.Type == "resize" {
// validate bounds
if ctrl.Cols > 0 && ctrl.Cols <= 500 && ctrl.Rows > 0 && ctrl.Rows <= 200 {
sshSession.WindowChange(ctrl.Rows, ctrl.Cols)
}
}
continue
}
stdin.Write(data)
}
}()
// SSH stdout → WebSocket
go func() {
defer cancel()
buf := make([]byte, 4096)
for {
n, err := stdout.Read(buf)
if err != nil { return }
atomic.StoreInt64(lastActive, time.Now().UnixNano())
wsConn.Write(ctx, websocket.BinaryMessage, buf[:n])
}
}()
// SSH stderr → WebSocket (merged into same stream)
go func() {
defer cancel() // stderr EOF also triggers cleanup
io.Copy(wsWriterAdapter(wsConn), stderr)
}()
<-ctx.Done()
}
Session Cleanup Order
- Cancel context (triggers bridge shutdown)
- Close WebSocket
- Close SSH session
- Close SSH client
- Remove session from server map (under mutex)
- Publish audit event via NATS:
audit.session.endwith payload{session_id, user_id, tenant_id, device_id, start_time, end_time, source_ip, reason}
Audit End-Time Pipeline
The API subscribes to the NATS subject audit.session.end (durable consumer, same pattern as existing NATS subscribers in backend/app/services/nats_subscribers.py). When a message arrives, the subscriber calls log_action("ssh_session_end", ...) with the session details including end_time and duration. This uses the existing self-committing audit service — no new persistence mechanism needed.
Idle Timeout
Per-session goroutine, every 30s:
idle := time.Since(time.Unix(0, atomic.LoadInt64(&sess.LastActive)))
if idle > 15 minutes:
cancel()
Source IP
Extracted from X-Real-IP header (set by nginx from $remote_addr), fallback to X-Forwarded-For last entry before nginx, fallback to r.RemoteAddr. Using X-Real-IP as primary avoids client-spoofed X-Forwarded-For entries.
Logging
Structured JSON logs for: session start, session end (with duration and reason: disconnect/idle/error). Fields: session_id, device_id, tenant_id, user_id, source_ip.
API: Remote Access Endpoints
New router: backend/app/routers/remote_access.py
WinBox Tunnel
POST /api/tenants/{tenant_id}/devices/{device_id}/winbox-session
RBAC: operator+
Flow:
- Validate JWT, require
operator+ - Verify device exists, belongs to tenant, is active (not disabled/deleted)
- Return 404 if not found, 403 if tenant mismatch (never leak cross-tenant existence)
- Extract source IP from
X-Real-IPheader (preferred, set by nginx), fallback torequest.client.host - Audit log:
log_action("winbox_tunnel_open", ...) - NATS request to
tunnel.open(10s timeout) - If timeout or error: return 503
- Validate returned port is in range 49000–49100
- Response:
{
"tunnel_id": "uuid",
"host": "127.0.0.1",
"port": 49023,
"winbox_uri": "winbox://127.0.0.1:49023",
"idle_timeout_seconds": 300
}
host is always hardcoded to "127.0.0.1" — never overridden by poller response.
Rate limit: 10 requests/min per user.
SSH Session Token
POST /api/tenants/{tenant_id}/devices/{device_id}/ssh-session
RBAC: operator+
Body: {"cols": 80, "rows": 24}
Flow:
- Validate JWT, require
operator+ - Verify device exists, belongs to tenant, is active
- Check session limits (10 per user, 20 per device) — return 429 if exceeded
- Audit log:
log_action("ssh_session_open", ...) - Generate token:
secrets.token_urlsafe(32) - Store in Redis with SETEX (atomic), 120s TTL. Key format:
ssh:token:<token_value>
{
"device_id": "uuid",
"tenant_id": "uuid",
"user_id": "uuid",
"source_ip": "1.2.3.4",
"cols": 80,
"rows": 24,
"created_at": 1710288000
}
- Response:
{
"token": "...",
"websocket_url": "/ws/ssh?token=<token>",
"idle_timeout_seconds": 900
}
Rate limit: 10 requests/min per user.
Input validation: cols 1–500, rows 1–200.
Tunnel Close
DELETE /api/tenants/{tenant_id}/devices/{device_id}/winbox-session/{tunnel_id}
RBAC: operator+
Idempotent — returns 200 even if tunnel already closed. Audit log recorded.
Active Sessions
GET /api/tenants/{tenant_id}/devices/{device_id}/sessions
RBAC: operator+
NATS request to poller. If poller doesn't respond within 10s, return empty session lists (degrade gracefully).
Schemas
class WinboxSessionResponse(BaseModel):
tunnel_id: str
host: str = "127.0.0.1"
port: int
winbox_uri: str
idle_timeout_seconds: int = 300
class SSHSessionRequest(BaseModel):
cols: int = Field(default=80, gt=0, le=500)
rows: int = Field(default=24, gt=0, le=200)
class SSHSessionResponse(BaseModel):
token: str
websocket_url: str
idle_timeout_seconds: int = 900
Error Responses
- 403: insufficient role or tenant mismatch
- 404: device not found
- 429: session or rate limit exceeded
- 503: poller unavailable or port range exhausted
Frontend: Remote Access UI
Dependencies
New: @xterm/xterm (v5+), @xterm/addon-fit, @xterm/addon-web-links. No other new dependencies.
Device Page
Remote access buttons render in the device header for operator+ roles:
┌──────────────────────────────────────────┐
│ site-branch-01 Online ● │
│ 10.10.0.5 RB4011 RouterOS 7.16 │
│ │
│ [ Open WinBox ] [ SSH Terminal ] │
│ │
└──────────────────────────────────────────┘
WinBox Button
States: idle, requesting, ready, closing, error.
On click:
- Mutation:
POST .../winbox-session - On success, display:
WinBox tunnel ready
Connect to: 127.0.0.1:49023
[ Copy Address ] [ Close Tunnel ]
Tunnel closes after 5 min of inactivity
- Attempt deep link on Windows only (detect via
navigator.userAgent):window.open("winbox://127.0.0.1:49023")— must fire directly inside the click handler chain (no setTimeout) to avoid browser blocking. On macOS/Linux, skip the deep link attempt and rely on the copy-address fallback. - Copy button with clipboard fallback for HTTP environments (textarea +
execCommand("copy")) - Navigating away does not close the tunnel — backend idle timeout handles cleanup
- Close button disabled while DELETE request is in flight
SSH Terminal
Two phases:
Phase 1 — Token acquisition:
POST .../ssh-session { cols, rows }
→ { token, websocket_url }
Phase 2 — Terminal session:
const term = new Terminal({
cursorBlink: true,
fontFamily: 'Geist Mono, monospace',
fontSize: 14,
scrollback: 2000,
convertEol: true,
theme: darkMode ? darkTheme : lightTheme
})
const fitAddon = new FitAddon()
term.loadAddon(fitAddon)
term.open(containerRef)
// fit after font load
fitAddon.fit()
WebSocket scheme derived dynamically: location.protocol === "https:" ? "wss" : "ws"
Data flow:
- User keystroke →
term.onData→ws.send(binaryFrame)→ poller → SSH stdin - Router output → SSH stdout → poller →
ws.onmessage→term.write(new Uint8Array(data)) - Resize →
term.onResize→ throttled (75ms) →ws.send(JSON.stringify({type:"resize", cols, rows}))
WebSocket lifecycle:
onopen:term.write("Connecting to router...\r\n")onmessage: binary →term.write, text → parse controlonclose: display "Session closed." in red, disable input, show Reconnect buttononerror: display "Connection error." in red- Abnormal close codes (1006, 1008, 1011) display appropriate messages
Reconnect: Always requests a new token. Never reuses WebSocket or token.
Cleanup on unmount:
useEffect(() => {
return () => {
term?.dispose()
ws?.close()
}
}, [])
Terminal UI:
┌──────────────────────────────────────────────────┐
│ SSH: site-branch-01 [ Disconnect ] │
├──────────────────────────────────────────────────┤
│ │
│ [admin@site-branch-01] > │
│ │
└──────────────────────────────────────────────────┘
SSH session active — idle timeout: 15 min
- Inline on device page by default, expandable to full viewport
- Auto-expand to full viewport on screens < 900px width
- Dark/light theme maps to existing Tailwind HSL tokens (no hardcoded hex)
tabindex=0on terminal container for keyboard focus- Active session indicator when sessions list returns data
API Client Extension
const remoteAccessApi = {
openWinbox: (tenantId: string, deviceId: string) =>
client.post<WinboxSessionResponse>(
`/tenants/${tenantId}/devices/${deviceId}/winbox-session`
),
closeWinbox: (tenantId: string, deviceId: string, tunnelId: string) =>
client.delete(
`/tenants/${tenantId}/devices/${deviceId}/winbox-session/${tunnelId}`
),
openSSH: (tenantId: string, deviceId: string, req: SSHSessionRequest) =>
client.post<SSHSessionResponse>(
`/tenants/${tenantId}/devices/${deviceId}/ssh-session`, req
),
getSessions: (tenantId: string, deviceId: string) =>
client.get<ActiveSessionsResponse>(
`/tenants/${tenantId}/devices/${deviceId}/sessions`
),
}
Infrastructure
nginx — WebSocket Proxy
Add to infrastructure/docker/nginx-spa.conf:
# WebSocket upgrade mapping (top-level, outside server block)
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
# Inside server block:
location /ws/ssh {
resolver 127.0.0.11 valid=10s ipv6=off;
set $poller_upstream http://poller:8080;
proxy_pass $poller_upstream;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_read_timeout 1800s;
proxy_send_timeout 1800s;
proxy_buffering off;
proxy_request_buffering off;
proxy_busy_buffers_size 512k;
proxy_buffers 8 512k;
}
CSP: The existing connect-src 'self' should be sufficient for same-origin WebSocket connections in modern browsers (CSP self matches same-origin ws:// and wss://). For maximum compatibility across all environments, explicitly add ws: wss: to the connect-src directive. HTTPS-only deployments can restrict to just wss:.
Docker Compose
Poller service additions — apply to these specific files:
docker-compose.override.yml(dev): ports, environment, ulimits, healthcheckdocker-compose.prod.yml(production): ports, environment, ulimits, healthcheck, increased memory limitdocker-compose.staging.yml(staging): same as prod
poller:
ports:
- "127.0.0.1:49000-49100:49000-49100"
ulimits:
nofile:
soft: 8192
hard: 8192
environment:
TUNNEL_PORT_MIN: 49000
TUNNEL_PORT_MAX: 49100
TUNNEL_IDLE_TIMEOUT: 300
SSH_RELAY_PORT: 8080
SSH_IDLE_TIMEOUT: 900
SSH_MAX_SESSIONS: 200
SSH_MAX_PER_USER: 10
SSH_MAX_PER_DEVICE: 20
healthcheck:
test: ["CMD-SHELL", "wget --spider -q http://localhost:8080/healthz || exit 1"]
interval: 30s
timeout: 3s
retries: 3
Production memory limit: Increase poller from 256MB to 384–512MB.
Redis dependency: Ensure depends_on: redis: condition: service_started.
Docker proxy note: The 101-port range mapping creates individual docker-proxy processes. For production, set "userland-proxy": false in /etc/docker/daemon.json to use iptables-based forwarding instead, which avoids spawning 101 proxy processes and improves startup time.
Poller HTTP Server
httpServer := &http.Server{
Addr: ":" + cfg.SSHRelayPort,
Handler: sshrelay.NewServer(redisClient, credCache).Handler(),
}
go httpServer.ListenAndServe()
// Graceful shutdown with 5s timeout
httpServer.Shutdown(ctx)
New Environment Variables
| Variable | Default | Description |
|---|---|---|
TUNNEL_PORT_MIN |
49000 |
Start of WinBox tunnel port range |
TUNNEL_PORT_MAX |
49100 |
End of WinBox tunnel port range |
TUNNEL_IDLE_TIMEOUT |
300 |
WinBox tunnel idle timeout (seconds) |
SSH_RELAY_PORT |
8080 |
Internal HTTP/WebSocket port for SSH relay |
SSH_IDLE_TIMEOUT |
900 |
SSH session idle timeout (seconds) |
SSH_MAX_SESSIONS |
200 |
Max concurrent SSH sessions per poller |
SSH_MAX_PER_USER |
10 |
Max concurrent SSH sessions per user |
SSH_MAX_PER_DEVICE |
20 |
Max concurrent SSH sessions per device |
Graceful Shutdown
When poller container shuts down:
- Stop accepting new tunnels and SSH sessions
- Close HTTP/WebSocket server (5s timeout)
- Gracefully terminate SSH sessions
- Close all tunnel listeners
- Wait for active connections
- Release tunnel ports
Testing Strategy
Unit Tests
Poller (Go):
- Port pool: allocation, release, reuse after close, concurrent access, exhaustion, bind failure retry
- Tunnel manager: lifecycle, idle detection with zero active connections, multiple concurrent connections on same tunnel, cleanup when listener creation fails
- TCP proxy: activity tracking (atomic), bidirectional shutdown, dial failure cleanup
- SSH relay: token validation (valid/expired/reused/wrong tenant), session limits, resize parsing and validation, malformed control messages, invalid JSON frames, binary frame size limits, resize flood protection, cleanup on SSH dial failure, cleanup on abrupt WebSocket close
Backend (Python):
- RBAC: viewer gets 403, operator gets 200
- Device validation: wrong tenant gets 404, disabled device rejected
- Token generation: stored in Redis with correct TTL
- Rate limiting: 11th request gets 429
- Session limits: exceed per-user/per-device limits gets 429
- Source IP extraction from X-Forwarded-For
- NATS timeout returns 503
- Redis unavailable during token storage
- Malformed request payloads rejected
Integration Tests
- Tunnel end-to-end: API → NATS → poller allocates port → verify listening on 127.0.0.1 → TCP connect → data forwarded to mock router
- SSH end-to-end: API issues token → WebSocket → poller validates → SSH to mock SSHD → verify keystroke round-trip and resize
- Token lifecycle: consumed on first use, second use rejected, expired token rejected
- Idle timeout: open tunnel, no traffic, verify closes after 5min; open SSH, no activity, verify closes after 15min
- Concurrent sessions: 10 SSH from same user succeeds, 11th rejected
- Tunnel stress: 50 concurrent tunnels, verify unique ports, verify cleanup
- SSH stress: many simultaneous WebSocket sessions, verify limits and stability
- Router unreachable: SSH dial fails, WebSocket closes with error, no zombie session
- Poller restart: sessions terminate, frontend shows disconnect, reconnect works
- Backward compatibility: existing polling, config push, NATS subjects unchanged
Security Tests
- Token replay: reuse consumed token → rejected
- Cross-tenant: user from tenant A accesses device from tenant B → rejected
- Malformed token: invalid base64, wrong length → rejected without panic
Resource Leak Detection
During integration testing, monitor: open file descriptors, goroutine count, memory usage. Verify SSH sessions and tunnels release all resources after closure.
Manual Testing
- WinBox tunnel to router behind WireGuard — full WinBox functionality
- SSH terminal — tab completion, arrow keys, command history, line wrapping after resize
- Deep link
winbox://on Windows — auto-launch - Copy address fallback on macOS/Linux
- Navigate away with open tunnel — stays open, closes on idle
- Poller restart — frontend handles disconnect, reconnect works
- Multiple SSH terminals to different devices simultaneously
- Dark/light mode terminal theme
- Chrome, Firefox, Safari — WebSocket stability, clipboard, deep link, resize
Observability Verification
Verify structured JSON logs exist with correct fields for: tunnel created/closed, port allocated, SSH session started/ended (with duration and reason), idle timeout events.
Rollout Sequence
- Deploy poller changes to staging (tunnel manager, SSH relay, HTTP server, NATS subjects)
- Deploy infrastructure changes (docker-compose ports, nginx WebSocket config, CSP, ulimits)
- Validate tunnels and SSH relay in staging
- Deploy API endpoints (remote access router, session tokens, audit logging, rate limiting)
- Deploy frontend (WinBox button, SSH terminal, API client)
- Update documentation (ARCHITECTURE, DEPLOYMENT, SECURITY, CONFIGURATION, README)
- Tag as v9.5 with release notes covering: WinBox remote access, browser SSH terminal, new env vars, port range requirement
Never deploy frontend before backend endpoints exist.
Out of Scope
- WinBox protocol reimplementation in browser
- SSH key authentication (password only, matching existing credential model)
- Session recording/playback
- File transfer through SSH terminal
- Multi-user shared terminal sessions