Commit Graph

36 Commits

Author SHA1 Message Date
Jason Staack
d3084abbb9 test(16-02): verify new Device fields in integration test
- Assert DeviceType defaults to "routeros" via COALESCE
- Assert SNMPPort defaults to 161 via COALESCE
- Assert SNMPVersion, SNMPProfileID, CredentialProfileID are nil for
  existing RouterOS devices without profile links
- Assert ProfileEncryptedCredentials and ProfileEncryptedCredentialsTransit
  are nil when no credential profile is linked
- Update test schema with device_type, snmp_port, snmp_version,
  snmp_profile_id, credential_profile_id columns
- Add credential_profiles table to test schema for LEFT JOIN

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:26:53 -05:00
Jason Staack
c1eb9ca41a feat(16-02): extend Device struct and queries for SNMP and credential profiles
- Add 7 new fields to store.Device: DeviceType, SNMPPort, SNMPVersion,
  SNMPProfileID, CredentialProfileID, ProfileEncryptedCredentials,
  ProfileEncryptedCredentialsTransit
- Update FetchDevices query with LEFT JOIN credential_profiles and
  expanded WHERE clause (credential_profile_id IS NOT NULL)
- Update GetDevice query with same JOIN and new columns
- COALESCE defaults: device_type='routeros', snmp_port=161

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:26:03 -05:00
Jason Staack
70c3d8ac7a test(16-03): add failing tests for credential type parsers
- ParseRouterOSCredentials: typed, legacy no-type-field, SNMP rejection, edge cases
- ParseSNMPCredentials: v1, v2c, v3 auth_priv, RouterOS rejection, legacy rejection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 18:25:59 -05:00
Jason Staack
2f079fd74f docs(poller): clarify RouterOS API protocol version in PollDevice comment
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:16:18 -05:00
Jason Staack
397a33abef feat(13-01): add DeviceInterfaceEvent publisher and wire into PollDevice
- DeviceInterfaceEvent type publishes to device.interfaces.{device_id}
- PublishDeviceInterfaces method follows existing publisher pattern
- DEVICE_EVENTS stream includes device.interfaces.> subject
- PollDevice collects interface info after traffic counters, before health
- Non-fatal errors with Prometheus metrics for publish success/failure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:05:55 -05:00
Jason Staack
6939584428 feat(13-01): add InterfaceInfo collector with MAC lowercasing and tests
- InterfaceInfo struct for link discovery (name, mac, type, running)
- CollectInterfaceInfo runs /interface/print (version-agnostic)
- MAC addresses lowercased for consistent matching
- Entries without mac-address skipped (loopback, bridge)
- Preserved existing InterfaceStats traffic counter collector

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:04:50 -05:00
Jason Staack
4b5bb949e9 test(13-01): add failing tests for InterfaceInfo collector
- InterfaceInfo struct field compilation test
- MAC address lowercasing test
- Running bool parsing test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 06:01:12 -05:00
Jason Staack
caa33ca8d7 feat(12-01): add RF monitor collector, WIRELESS_REGISTRATIONS stream, wire into poll cycle
- RFMonitorStats struct for per-interface RF data (noise floor, channel width, TX power)
- CollectRFMonitor with v6/v7 RouterOS version routing
- WIRELESS_REGISTRATIONS NATS stream with 30-day retention (separate from DEVICE_EVENTS)
- WirelessRegistrationEvent type and PublishWirelessRegistrations method
- Poll cycle collects per-client registrations and RF stats, publishes combined event

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 05:38:14 -05:00
Jason Staack
23d6b38a4d feat(12-01): add per-client wireless registration collector and signal parser
- RegistrationEntry struct for per-client wireless data (MAC, signal, CCQ, rates, distance)
- ParseSignalStrength handles all RouterOS format variations (-67, -67@5GHz, -67@HT40)
- CollectRegistrations with v6/v7 RouterOS version routing
- Unit tests for ParseSignalStrength covering 10 cases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 05:36:08 -05:00
Jason Staack
05e5595c2b fix(poller): add 64MB cap on DEVICE_EVENTS NATS stream
Without a byte limit, the stream grows unbounded within its 24h
max_age window. At 101 devices polling every 60s, it hits 128MB
in ~10 hours and OOMs the NATS container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 05:52:09 -05:00
Jason Staack
0adcb52efc fix: handle SSH bridge write errors in poller
Add error checking to all three write calls in bridge.go. A write
failure now terminates that goroutine's copy loop and triggers cancel,
which lets the other goroutines clean up naturally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:15:22 -05:00
Jason Staack
f49f5f739b fix: remove dead code (toast stubs, unused Redis key, tunnel manager fields)
- Remove 7 no-op exported stubs from toast.tsx (ToastProvider, ToastViewport,
  Toast, ToastTitle, ToastDescription, ToastClose, useToasts) — nothing imports them
- Remove fwFailKey variable and its Set() call from worker.go — the
  firmware:check-failed Redis key was never read anywhere
- Remove unused deviceStore and credCache fields from tunnel.Manager struct
  and drop corresponding parameters from NewManager(); update call site in
  main.go and all test usages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:12:56 -05:00
Jason Staack
83e59ed8d7 fix: write device status to Redis, check Set() errors, use cached version fallback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 23:10:52 -05:00
Jason Staack
fe23459369 fix(ci): fix hardcoded DB name in migration and Go version compat
- migration 002: use current_database() instead of hardcoded 'tod'
- ci.yml: use Go 1.25 (required by nats-server dep), mark golangci-lint
  as continue-on-error until it supports Go 1.25
- go.mod: keep at 1.25.0 (nats-server v2.12.5 requires it)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 23:03:20 -05:00
Jason Staack
e19745c1ba fix(ci): resolve Go lint and test failures in poller
- Add .golangci.yml to configure golangci-lint (disables errcheck which
  fires excessively on idiomatic defer Close() patterns; suppresses SA1019
  and ST1000 staticcheck rules)
- Fix testutil devicesSchema missing columns: certificate_authorities table,
  encrypted_credentials_transit, tls_mode, ssh_port, ssh_host_key_fingerprint
  — all required by FetchDevices/GetDevice LEFT JOIN queries
- Remove dead collectHealthError function from device/health.go (unused)
- Fix S1009 staticcheck: remove redundant nil check before len() in vault/cache.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 22:22:53 -05:00
Jason Staack
970501e453 feat: implement Remote WinBox worker, API, frontend integration, OpenBao persistence, and supporting docs 2026-03-14 09:05:14 -05:00
Jason Staack
ed3ad8eb17 chore: update about page to v9.6 and Dockerfile to Go 1.25
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 06:54:08 -05:00
Jason Staack
0851eced36 feat(04-01): implement BackupResponder with extracted CollectAndPublish
- Create BackupResponder for NATS request-reply on config.backup.trigger
- Extract public CollectAndPublish from BackupScheduler returning sha256 hash
- Define BackupExecutor/BackupLocker/DeviceGetter interfaces for testability
- Create RedisBackupLocker adapter wrapping redislock.Client
- Wire BackupResponder into main.go lifecycle
- All 6 tests pass with in-process NATS server

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:07:35 -05:00
Jason Staack
9e102fda20 test(04-01): add failing tests for BackupResponder
- Test subscribe registers subscription
- Test valid request returns success with sha256_hash
- Test lock held returns locked status
- Test invalid JSON returns error
- Test Stop unsubscribes cleanly
- Test device not found returns failed status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:04:44 -05:00
Jason Staack
d34817a36c feat(02-02): wire BackupScheduler into main.go lifecycle
- Create BackupScheduler with all dependencies injected
- Run as goroutine parallel to status poll scheduler
- Shares same context for graceful shutdown via SIGINT/SIGTERM
- Startup logged with interval, max_concurrent, command_timeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:55:06 -05:00
Jason Staack
2653a32d6f feat(02-02): implement BackupScheduler with per-device goroutines and concurrency control
- BackupScheduler manages per-device backup goroutines independently from status poll
- First backup uses 30-300s random jitter delay to spread load
- Concurrency limited by buffered channel semaphore (configurable max)
- Per-device Redis lock prevents duplicate backups across pods
- Auth failures and host key mismatches block retries with clear warnings
- Transient errors use 5m/15m/1h exponential backoff with cap
- Offline devices skipped via Redis status key check
- TOFU fingerprint stored on first successful SSH connection
- Config output validated, normalized, hashed, published to NATS
- SSHHostKeyUpdater interface added to interfaces.go
- All 12 backup unit tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:54:23 -05:00
Jason Staack
a884b0945d test(02-02): add failing tests for BackupScheduler
- Jitter range, backoff sequence, shouldRetry blocking logic
- Online-only gating via Redis, concurrency semaphore behavior
- Reconciliation start/stop device lifecycle

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:52:29 -05:00
Jason Staack
4ae39d2cb3 feat(02-01): add config backup env vars, NATS event, device SSH fields, migration, metrics
- Config: CONFIG_BACKUP_INTERVAL (21600s), CONFIG_BACKUP_MAX_CONCURRENT (10), CONFIG_BACKUP_COMMAND_TIMEOUT (60s)
- NATS: ConfigSnapshotEvent type, PublishConfigSnapshot method, config.snapshot.> stream subject
- Device: SSHPort/SSHHostKeyFingerprint fields, UpdateSSHHostKey method, updated queries/scans
- Migration 028: ssh_port, ssh_host_key_fingerprint, timestamp columns with poller_user grants
- Metrics: ConfigBackupTotal (counter), ConfigBackupDuration (histogram), ConfigBackupActive (gauge)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:48:12 -05:00
Jason Staack
f1abb75cab feat(02-01): add SSH executor with TOFU host key verification and config normalizer
- SSH RunCommand with typed error classification (auth, hostkey, timeout, connection refused, truncated)
- TOFU host key callback: accept-on-first-connect, verify-on-subsequent, reject-on-mismatch
- NormalizeConfig strips timestamps, normalizes line endings, trims whitespace, collapses blanks
- HashConfig returns 64-char lowercase hex SHA256 of normalized config
- 22 unit tests covering all error kinds, TOFU flows, normalization edge cases, idempotency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:46:04 -05:00
Jason Staack
c2eea6847f fix: WinBox tunnel bind address, port range, and proxy support
- Bind tunnel listeners to 0.0.0.0 instead of 127.0.0.1 so tunnels
  are reachable through reverse proxies and container networks
- Reduce port range to 49000-49004 (5 concurrent tunnels)
- Derive WinBox URI host from request Host header instead of
  hardcoding 127.0.0.1, enabling use behind reverse proxies
- Add README security warning about default encryption keys

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 19:03:53 -05:00
Jason Staack
acf1790bed feat: add audit.session.end NATS pipeline for SSH session tracking
Poller publishes session end events via JetStream when SSH sessions
close (normal disconnect or idle timeout). Backend subscribes with a
durable consumer and writes ssh_session_end audit log entries with
duration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 16:07:10 -05:00
Jason Staack
7aaaeaa1d1 fix: address spec compliance gaps - tenant check, XFF fallback, rate limiting
- Gap 1: Add tenant ID verification after device lookup in SSH relay handleSSH,
  closing cross-tenant token reuse vulnerability
- Gap 2: Add X-Forwarded-For fallback (last entry) when X-Real-IP is absent in
  SSH relay source IP extraction; import strings package
- Gap 3: Add @limiter.limit("10/minute") to POST /winbox-session and POST
  /ssh-session using existing slowapi pattern from app.middleware.rate_limit
- Gap 4: Add TODO comment in open_ssh_session explaining that SSH session count
  enforcement is at the poller level; no NATS subject exists yet for API-side
  pre-check

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:51:14 -05:00
Jason Staack
cb427272ed feat(poller): wire tunnel manager and SSH relay into main
Add TunnelManager, TunnelResponder, SSH relay server, and SSH relay HTTP
server to the poller startup sequence with env-configurable port ranges,
idle timeouts, and session limits. Extends graceful shutdown to cover the
HTTP server (5s context), tunnel manager, and SSH relay server via defer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:35:55 -05:00
Jason Staack
c73466c5e0 feat(poller): add SSH relay server with WebSocket-to-PTY bridge
Implements the SSH relay server (Task 2.1) that validates single-use
Redis tokens via GETDEL, dials SSH to the target device with PTY,
and bridges WebSocket binary/text frames to SSH stdin/stdout/stderr
with idle timeout and per-user/per-device session limits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:33:48 -05:00
Jason Staack
d3d3e36192 feat(poller): add NATS tunnel responder for WinBox tunnel management
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:30:34 -05:00
Jason Staack
7a6ebdca89 feat(poller): add tunnel manager with idle cleanup and status tracking
Implements Manager which orchestrates WinBox tunnel lifecycle: open,
close, idle cleanup, and status queries. Uses PortPool and Tunnel from
Tasks 1.2/1.3. DeviceStore and CredentialCache wired in for Task 1.5.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 15:28:56 -05:00
Jason Staack
8105b995ff feat(poller): add TCP tunnel with bidirectional proxy and activity tracking
Implements Tunnel type that listens on a local port, accepts WinBox client
connections, dials the remote RouterOS device, and proxies traffic
bidirectionally. Uses activityReader to atomically update LastActive on
each read for idle timeout detection. Per-connection contexts derive from
the tunnel context so Close() terminates all connections cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:26:47 -05:00
Jason Staack
d885f9b4b6 feat(poller): add port pool for WinBox tunnel allocation
Implements PortPool with mutex-protected allocation, bind verification
to skip ports already in use by the OS, and release-for-reuse semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 15:25:01 -05:00
Jason Staack
5f9410fa54 chore(poller): add websocket dependency for remote access 2026-03-12 15:23:48 -05:00
Cog
58597ad4fd fix: CRLF/BOM line endings + restart policies + gitattributes
- poller/docker-entrypoint.sh: convert from CRLF+BOM to LF (UTF-8 no BOM)
  Windows saved the file with a UTF-8 BOM which made the Linux kernel
  reject the shebang with 'exec format error', crashing the poller.

- infrastructure/openbao/init.sh: same CRLF -> LF fix

- poller/Dockerfile: add sed to strip CRLF and BOM at image build time
  as a defensive measure for future Windows edits

- docker-compose.override.yml: add 'restart: on-failure' to api and poller
  so they recover from the postgres startup race (TimescaleDB restarts
  postgres after initdb, briefly causing connection refused on first boot)

- .gitattributes: enforce LF for all text/script/code files so git
  normalises line endings on checkout and prevents this class of bug
2026-03-12 14:05:40 -05:00
Jason Staack
b840047e19 feat: The Other Dude v9.0.1 — full-featured email system
ci: add GitHub Pages deployment workflow for docs site

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 19:30:44 -05:00