512 Commits

Author SHA1 Message Date
Jason Staack
a4e1c78744 docs: update documentation for v9.5 remote access feature
Add tunnel manager, SSH relay, new env vars, security model, and
Remote Access key feature entry across ARCHITECTURE, DEPLOYMENT,
SECURITY, CONFIGURATION, and README.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:47:03 -05:00
Jason Staack
d2471278ab feat(frontend): integrate WinBox and SSH buttons into device page
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:45:14 -05:00
Jason Staack
b76fdb3240 feat(frontend): add SSH terminal component with xterm.js 2026-03-12 15:43:31 -05:00
Jason Staack
b3b2f87beb feat(frontend): add WinBox tunnel button component 2026-03-12 15:43:03 -05:00
Jason Staack
79afd2a1ad feat(frontend): add remote access API client methods 2026-03-12 15:42:42 -05:00
Jason Staack
e5a9758f58 chore(frontend): add xterm.js dependencies for SSH terminal 2026-03-12 15:42:29 -05:00
Jason Staack
27f4403856 feat(infra): add nginx WebSocket proxy and SSH relay config to compose files
- Add WebSocket upgrade map to nginx and proxy /ws/ssh to poller:8080
- Update CSP connect-src to allow ws: and wss: for terminal connections
- Add tunnel port range 49000-49100, SSH relay env vars, ulimits, and healthcheck to poller in both override and prod compose files
- Increase poller memory limit to 512M in prod for tunnel/SSH overhead

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:40:53 -05:00
Jason Staack
4860fad643 feat(api): add remote access endpoints for WinBox tunnels and SSH sessions
Implements four operator-gated endpoints under /api/tenants/{tenant_id}/devices/{device_id}/:
- POST /winbox-session: opens a WinBox tunnel via NATS request-reply to poller
- POST /ssh-session: mints a single-use Redis token (120s TTL) for WebSocket SSH relay
- DELETE /winbox-session/{tunnel_id}: idempotently closes a WinBox tunnel
- GET /sessions: lists active WinBox tunnels via NATS tunnel.status.list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:39:24 -05:00
Jason Staack
63fa45ffdd feat(api): add remote access pydantic schemas 2026-03-12 15:36:36 -05:00
Jason Staack
cb427272ed feat(poller): wire tunnel manager and SSH relay into main
Add TunnelManager, TunnelResponder, SSH relay server, and SSH relay HTTP
server to the poller startup sequence with env-configurable port ranges,
idle timeouts, and session limits. Extends graceful shutdown to cover the
HTTP server (5s context), tunnel manager, and SSH relay server via defer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:35:55 -05:00
Jason Staack
c73466c5e0 feat(poller): add SSH relay server with WebSocket-to-PTY bridge
Implements the SSH relay server (Task 2.1) that validates single-use
Redis tokens via GETDEL, dials SSH to the target device with PTY,
and bridges WebSocket binary/text frames to SSH stdin/stdout/stderr
with idle timeout and per-user/per-device session limits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:33:48 -05:00
Jason Staack
d3d3e36192 feat(poller): add NATS tunnel responder for WinBox tunnel management
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:30:34 -05:00
Jason Staack
7a6ebdca89 feat(poller): add tunnel manager with idle cleanup and status tracking
Implements Manager which orchestrates WinBox tunnel lifecycle: open,
close, idle cleanup, and status queries. Uses PortPool and Tunnel from
Tasks 1.2/1.3. DeviceStore and CredentialCache wired in for Task 1.5.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:28:56 -05:00
Jason Staack
8105b995ff feat(poller): add TCP tunnel with bidirectional proxy and activity tracking
Implements Tunnel type that listens on a local port, accepts WinBox client
connections, dials the remote RouterOS device, and proxies traffic
bidirectionally. Uses activityReader to atomically update LastActive on
each read for idle timeout detection. Per-connection contexts derive from
the tunnel context so Close() terminates all connections cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:26:47 -05:00
Jason Staack
d885f9b4b6 feat(poller): add port pool for WinBox tunnel allocation
Implements PortPool with mutex-protected allocation, bind verification
to skip ports already in use by the OS, and release-for-reuse semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:25:01 -05:00
Jason Staack
5f9410fa54 chore(poller): add websocket dependency for remote access 2026-03-12 15:23:48 -05:00
Jason Staack
c0304da2dd docs: add remote access (v9.5) implementation plan
Six-chunk TDD implementation plan for WinBox TCP tunnels and SSH terminal relay through the Go poller. Covers tunnel manager, SSH relay, API endpoints, infrastructure, frontend, and documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:20:04 -05:00
Jason Staack
d16a5c991f docs: add remote access design spec (WinBox tunnels + SSH terminal)
Comprehensive design for v9.5 remote access feature:
- WinBox TCP tunnel through poller with localhost port allocation
- Browser SSH terminal via xterm.js + WebSocket to poller SSH relay
- RBAC enforcement (operator+), audit logging, session tokens
- Infrastructure: nginx WebSocket proxy, Docker port range mapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:10:40 -05:00
Jason Staack
bb9176fb9c docs: update docs to reflect recent fixes and actual codebase state
- Fix Go version (1.23 → 1.24), router count (21 → 25), add settings router
- Document vault key decryption on login and refresh token cookie delivery
- Document audit log self-commit behavior for reliability
- Add firmware cache volume and nginx dynamic DNS resolver to deployment guide
- Fix placeholder clone URL to actual repository

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:13:57 -05:00
Cog
6b22741f54 fix: audit logs never persisted + firmware-cache permission denied
Two bugs fixed:

1. audit_service.py: log_action() inserted into audit_logs using the
   caller's DB session but never committed. Any router that called
   db.commit() before log_action() (firmware, devices, config_editor,
   alerts, certificates) had its audit rows silently rolled back when
   the request session closed.
   Fix: log_action now opens its own AdminAsyncSessionLocal and self-
   commits, making audit persistence independent of the caller's
   transaction. The 'db' parameter is kept for backward compat but
   unused. Affects 5 routers (firmware, devices, config_editor,
   alerts, certificates).

2. docker-compose.override.yml: /data/firmware-cache had no volume
   mount so the directory didn't exist in the container, causing
   firmware downloads to fail with Permission denied.
   Fix: bind-mount docker-data/firmware-cache:/data/firmware-cache
   so firmware images survive container restarts.
2026-03-12 14:05:40 -05:00
Cog
21b8ce029f fix: nginx 502 after API container restart (dynamic DNS resolver)
Without a resolver directive, nginx resolves upstream hostnames once at
startup and caches the IP forever. When the API container restarts it gets
a new Docker-assigned IP, causing 502 Bad Gateway until nginx is reloaded.

Fix:
- Add 'resolver 127.0.0.11 valid=10s' (Docker embedded DNS)
- Use a variable in proxy_pass ('set \ api') so nginx
  re-resolves on every request using the resolver above
- Variable proxy_pass passes the full request URI as-is, so /api/...
  correctly maps to http://api:8000/api/... without double-pathing
2026-03-12 14:05:40 -05:00
Cog
58597ad4fd fix: CRLF/BOM line endings + restart policies + gitattributes
- poller/docker-entrypoint.sh: convert from CRLF+BOM to LF (UTF-8 no BOM)
  Windows saved the file with a UTF-8 BOM which made the Linux kernel
  reject the shebang with 'exec format error', crashing the poller.

- infrastructure/openbao/init.sh: same CRLF -> LF fix

- poller/Dockerfile: add sed to strip CRLF and BOM at image build time
  as a defensive measure for future Windows edits

- docker-compose.override.yml: add 'restart: on-failure' to api and poller
  so they recover from the postgres startup race (TimescaleDB restarts
  postgres after initdb, briefly causing connection refused on first boot)

- .gitattributes: enforce LF for all text/script/code files so git
  normalises line endings on checkout and prevents this class of bug
2026-03-12 14:05:40 -05:00
Cog
57e754bb27 fix: implement vault key decryption on login + fix token refresh via cookie
Three bugs fixed:

1. Phase 30 (auth.ts): After SRP login the encrypted_key_set was returned
   from the server but the vault key and RSA private key were never unwrapped
   with the AUK. keyStore.getVaultKey() was always null, causing Tier 1
   config-backup diffs to crash with a TypeError.
   Fix: unwrap vault key and private key using crypto.subtle.unwrapKey after
   successful SRP verification. Non-fatal: warns to console if decryption
   fails so login always succeeds.

2. Token refresh (auth.py): The /refresh endpoint required refresh_token in
   the request body, but the frontend never stored or sent it. After the 15-
   minute access token TTL, all authenticated API calls would fail silently
   because the interceptor sent an empty body and received 422 (not 401),
   so the retry loop never fired.
   Fix: login/srpVerify now set an httpOnly refresh_token cookie scoped to
   /api/auth/refresh. The refresh endpoint now accepts the token from either
   cookie (preferred) or body (legacy). Logout clears both cookies.
   RefreshRequest.refresh_token is now Optional to allow empty-body calls.

3. Silent token rotation: the /refresh endpoint now also rotates the refresh
   token cookie on each use (issues a fresh token), reducing the window for
   stolen refresh token replay.
2026-03-12 14:05:40 -05:00
Jason Staack
d0548bec86 fix(crypto): use 27 base-30 chars for Secret Key to prevent data loss
The Secret Key encoder used 26 base-30 characters which can only
represent 30^26 ≈ 2^127.58 values. Since the key is 128 bits,
~25% of generated keys silently lost their high bits during
formatting, making the Emergency Kit key unable to reconstruct
the original bytes on a new browser.

Changed KEY_CHAR_LENGTH from 26 to 27 (30^27 > 2^128). Parser
accepts both old 26-char and new 27-char keys for backward
compatibility. Format: A3-XXXXXX-XXXXXX-XXXXXX-XXXXXX-XXX

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 14:04:24 -05:00
Jason Staack
b954e0aa2a fix(docs): restore full quick start steps, fix architecture visibility
Restore original 6-step quick start with comments. Increase arch flow
box contrast (bg-deep background, stronger border) and arrow size.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:18:53 -05:00
Jason Staack
b280faadf8 docs: update landing page copy from review notes
- Hero: tighter 3-line intro focused on the problem
- What It Does: updated section label
- Safe Config: panic-revert language, fleet-wide templates
- Who This Is For: expanded audience descriptions
- Architecture: new section with vertical flow diagram
- Quick Start: simplified to 3 commands
- CTA: open source + self-hosted, closing tagline
- Slow gradient fill animation from 1.2s to 2s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:16:38 -05:00
Jason Staack
44de8c42c4 fix(docs): extend gradient fill across full hero title
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:11:43 -05:00
Jason Staack
786d494670 fix(docs): fix gradient fill direction so it ends on gradient, not white
Gradient on left half of background, white on right. Animation sweeps
from white to gradient. Uses 'both' fill mode for correct state during
delay and after completion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:10:50 -05:00
Jason Staack
dc6182bbd0 feat(docs): animate gradient fill across hero title on page load
The gradient sweeps left-to-right across "Centralized Management"
after a 0.3s delay, transitioning from plain text to the teal-burgundy
gradient over 1.2s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:09:23 -05:00
Jason Staack
e2813116da fix(docs): stagger bullet throb so only one pulses at a time per list
Each list gets a dynamically generated keyframe where only 1/N of the
cycle is active. Bullets are staggered 0.8s apart so they take turns
pulsing in sequence, looping forever.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:08:11 -05:00
Jason Staack
a3630c03e6 fix(docs): make bullet throb loop continuously via CSS only
Replace scroll-triggered JS animation with infinite CSS keyframe loop
(2.4s cycle). Remove IntersectionObserver code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:06:34 -05:00
Jason Staack
7126621e83 fix(docs): make bullet throb repeat on every scroll into view
Remove unobserve so bullets reset when scrolled out and throb again
on re-entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:05:22 -05:00
Jason Staack
040f0335c1 feat(docs): add bullet throb animation on scroll
Teal bullet dots pulse with a staggered throb when list items scroll
into view. Uses IntersectionObserver with 120ms stagger per item.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:04:11 -05:00
Jason Staack
ff09382e3a fix(docs): center content sections to match hero layout
Section labels, titles, descriptions, and closing statements are now
centered. Bullet lists remain left-aligned within their centered
container for readability. Fixes visual disconnect between centered
hero and left-justified content sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:00:34 -05:00
Jason Staack
0aa09260db docs: rewrite landing page with operator-focused copy
Replace marketing-heavy hero, feature cards, and architecture diagram
with straightforward copy aimed at real MikroTik operators. New sections:
What It Does, Safe Configuration, Built for Real Operators, Designed for
Scale, and Open Source CTA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:58:21 -05:00
Jason Staack
5fbff03dea docs: simplify architecture diagram, remove tech stack badges
Replace verbose ASCII architecture in README with clean linear flow.
Remove tech stack badge grid from landing page.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:54:49 -05:00
Jason Staack
28870a61a3 docs: add early access / testing disclaimer
Banner on landing page, docs page, and GitHub README warning that the
software is in active development and not yet ready for production use.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:49:24 -05:00
Jason Staack
a3d2172797 fix(backend): change APP_BASE_URL default from port 5173 to 3000
Password reset email links pointed to Vite dev port instead of the
Docker frontend port.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:39:04 -05:00
Jason Staack
837ab6f8fa fix(backend): parse CLI command string into RouterOS API command + args
execute_cli was passing the full CLI string (e.g. '/ping address=8.8.8.8
count=4') as a single command to the Go poller. go-routeros expects the
command path and args separately. Now splits into command + prefixed args.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:05:05 -05:00
Jason Staack
394be01145 fix(frontend): default ping/traceroute target to 8.8.8.8
The target input showed "8.8.8.8" as placeholder text but the actual
value was empty. Clicking Ping/Traceroute silently returned because
the empty target guard fired. Users saw the placeholder and assumed
the tool was broken.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:57:10 -05:00
Jason Staack
a3cc35c4b7 fix(frontend): generate Emergency Kit PDF client-side with actual Secret Key
The server-generated PDF had a placeholder for the Secret Key that was
never filled in client-side, making the Emergency Kit useless. Users
who relied on it could not recover their Secret Key on new devices.

Now generates the PDF entirely client-side via browser print dialog,
with the real Secret Key embedded. No server round-trip, key never
leaves the browser.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:47:50 -05:00
Jason Staack
6c7dfe02f5 fix(frontend): show Secret Key field when IndexedDB key is stale
When a user logs in from a browser with an outdated Secret Key in
IndexedDB (e.g. after server rebuild/re-enrollment), the SRP handshake
fails with 401 but the Secret Key input field was never shown — leaving
the user stuck with no way to enter their current key.

Now detects stale-key 401s and prompts for manual Secret Key entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:30:09 -05:00
Jason Staack
32965857e7 fix: wire up empty-state Add Device button to open dialog
The FleetTable empty state navigated with ?add=true but the devices page
never read that param. Now it opens the AddDeviceForm when add=true is
in the search params.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:12:01 -05:00
Jason Staack
2605a97331 fix: use user.user_id instead of user.id in SMTP settings save
CurrentUser object uses user_id attribute, not id. Caused AttributeError
on PUT /api/settings/smtp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:08:21 -05:00
Jason Staack
f7a53e60da fix: SMTP TLS logic was inverted — plain SMTP incorrectly used STARTTLS
When use_tls=false, the old logic set start_tls=true for any port != 25,
which broke plain SMTP servers like Mailpit. Now:
- Port 465: implicit TLS
- use_tls=true on other ports: STARTTLS
- use_tls=false: plain SMTP (no TLS)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 21:03:54 -05:00
Jason Staack
2461582cb3 fix: use relative paths for bind-mount volumes
Absolute paths (/Volumes/ssd01/mikrotik/docker-data/) are machine-specific
and won't work on any other system. Use ./docker-data/ so the repo works
wherever it's cloned.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:51:33 -05:00
Jason Staack
9510e56ced fix: use working dev defaults in .env.example
- POSTGRES_PASSWORD and DB URLs now match what docker-compose.override.yml
  and init-postgres.sql actually use (postgres/postgres, app_password)
- CREDENTIAL_ENCRYPTION_KEY is now valid base64 (32 bytes) so the API
  actually starts instead of crashing on the Pydantic validator
- JWT_SECRET_KEY is a dev-only value (insecure defaults check skips dev)
- Added quick-start comment block with login credentials

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:46:33 -05:00
Jason Staack
8949a36454 fix: bump Node to 20 in frontend Dockerfile for Vite 7 compatibility
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:39:51 -05:00
Jason Staack
d30df957e2 fix: rename network from 'mikrotik' to 'tod' to match overlay compose files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:35:49 -05:00
staack
4cb887b8fa ci: add feature request issue template 2026-03-09 17:50:45 -05:00