- Implement computeCounterDelta for Counter32/Counter64 with wraparound handling
- Sanity threshold discards deltas > 90% of max value (device reset detection)
- CounterCache uses Redis MGET/MSET pipelining for efficient state persistence
- Counter keys use "snmp:counter:{device_id}:{oid}" format with 600s TTL
- Add SNMPMetricsEvent and SNMPMetricEntry structs to bus package
- Add PublishSNMPMetrics publishing to "device.metrics.snmp_custom.{device_id}"
- Full test coverage: 10 counter tests including miniredis integration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Install gosnmp v1.43.2 as direct dependency
- Create snmp package with SNMPConfig, CompiledProfile, PollGroup types
- Implement BuildSNMPClient for v1, v2c, and v3 (all security levels)
- Map auth protocols (MD5, SHA, SHA224-512) and priv protocols (DES, AES128-256)
- MaxRepetitions set to 10 (not gosnmp default 50) for embedded devices
- Full test coverage: 9 tests covering all SNMP versions and protocol mappings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add collectors map[string]Collector field to Scheduler struct
- Register RouterOSCollector for "routeros" inside NewScheduler
- Replace direct PollDevice call with collector dispatch by dev.DeviceType
- Default empty DeviceType to "routeros" for backward compatibility
- Log error and exit device loop for unknown device types
- Circuit breaker logic unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GetRawCredentials resolves credentials: device transit, device legacy, profile transit, profile legacy
- Cache key includes source (device/profile) to prevent cross-source poisoning
- GetCredentials is now a backward-compatible wrapper calling GetRawCredentials + ParseRouterOSCredentials
- Add DecryptRaw to device package for raw byte decryption without JSON parsing
- Invalidate clears both parsed and raw cache entries
- All existing callers (PollDevice, CmdResponder, TunnelResponder, BackupResponder, SSHRelay) unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SNMPCredential struct with v1/v2c/v3 field support
- ParseRouterOSCredentials handles typed and legacy no-type-field JSON
- ParseSNMPCredentials handles snmp_v1, snmp_v2c, snmp_v3 types
- credentialEnvelope for type-agnostic type field peeking
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Assert DeviceType defaults to "routeros" via COALESCE
- Assert SNMPPort defaults to 161 via COALESCE
- Assert SNMPVersion, SNMPProfileID, CredentialProfileID are nil for
existing RouterOS devices without profile links
- Assert ProfileEncryptedCredentials and ProfileEncryptedCredentialsTransit
are nil when no credential profile is linked
- Update test schema with device_type, snmp_port, snmp_version,
snmp_profile_id, credential_profile_id columns
- Add credential_profiles table to test schema for LEFT JOIN
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 7 new fields to store.Device: DeviceType, SNMPPort, SNMPVersion,
SNMPProfileID, CredentialProfileID, ProfileEncryptedCredentials,
ProfileEncryptedCredentialsTransit
- Update FetchDevices query with LEFT JOIN credential_profiles and
expanded WHERE clause (credential_profile_id IS NOT NULL)
- Update GetDevice query with same JOIN and new columns
- COALESCE defaults: device_type='routeros', snmp_port=161
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DeviceInterfaceEvent type publishes to device.interfaces.{device_id}
- PublishDeviceInterfaces method follows existing publisher pattern
- DEVICE_EVENTS stream includes device.interfaces.> subject
- PollDevice collects interface info after traffic counters, before health
- Non-fatal errors with Prometheus metrics for publish success/failure
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- InterfaceInfo struct field compilation test
- MAC address lowercasing test
- Running bool parsing test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- RFMonitorStats struct for per-interface RF data (noise floor, channel width, TX power)
- CollectRFMonitor with v6/v7 RouterOS version routing
- WIRELESS_REGISTRATIONS NATS stream with 30-day retention (separate from DEVICE_EVENTS)
- WirelessRegistrationEvent type and PublishWirelessRegistrations method
- Poll cycle collects per-client registrations and RF stats, publishes combined event
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- RegistrationEntry struct for per-client wireless data (MAC, signal, CCQ, rates, distance)
- ParseSignalStrength handles all RouterOS format variations (-67, -67@5GHz, -67@HT40)
- CollectRegistrations with v6/v7 RouterOS version routing
- Unit tests for ParseSignalStrength covering 10 cases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without a byte limit, the stream grows unbounded within its 24h
max_age window. At 101 devices polling every 60s, it hits 128MB
in ~10 hours and OOMs the NATS container.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add error checking to all three write calls in bridge.go. A write
failure now terminates that goroutine's copy loop and triggers cancel,
which lets the other goroutines clean up naturally.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove 7 no-op exported stubs from toast.tsx (ToastProvider, ToastViewport,
Toast, ToastTitle, ToastDescription, ToastClose, useToasts) — nothing imports them
- Remove fwFailKey variable and its Set() call from worker.go — the
firmware:check-failed Redis key was never read anywhere
- Remove unused deviceStore and credCache fields from tunnel.Manager struct
and drop corresponding parameters from NewManager(); update call site in
main.go and all test usages
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- migration 002: use current_database() instead of hardcoded 'tod'
- ci.yml: use Go 1.25 (required by nats-server dep), mark golangci-lint
as continue-on-error until it supports Go 1.25
- go.mod: keep at 1.25.0 (nats-server v2.12.5 requires it)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create BackupResponder for NATS request-reply on config.backup.trigger
- Extract public CollectAndPublish from BackupScheduler returning sha256 hash
- Define BackupExecutor/BackupLocker/DeviceGetter interfaces for testability
- Create RedisBackupLocker adapter wrapping redislock.Client
- Wire BackupResponder into main.go lifecycle
- All 6 tests pass with in-process NATS server
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test subscribe registers subscription
- Test valid request returns success with sha256_hash
- Test lock held returns locked status
- Test invalid JSON returns error
- Test Stop unsubscribes cleanly
- Test device not found returns failed status
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create BackupScheduler with all dependencies injected
- Run as goroutine parallel to status poll scheduler
- Shares same context for graceful shutdown via SIGINT/SIGTERM
- Startup logged with interval, max_concurrent, command_timeout
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- BackupScheduler manages per-device backup goroutines independently from status poll
- First backup uses 30-300s random jitter delay to spread load
- Concurrency limited by buffered channel semaphore (configurable max)
- Per-device Redis lock prevents duplicate backups across pods
- Auth failures and host key mismatches block retries with clear warnings
- Transient errors use 5m/15m/1h exponential backoff with cap
- Offline devices skipped via Redis status key check
- TOFU fingerprint stored on first successful SSH connection
- Config output validated, normalized, hashed, published to NATS
- SSHHostKeyUpdater interface added to interfaces.go
- All 12 backup unit tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bind tunnel listeners to 0.0.0.0 instead of 127.0.0.1 so tunnels
are reachable through reverse proxies and container networks
- Reduce port range to 49000-49004 (5 concurrent tunnels)
- Derive WinBox URI host from request Host header instead of
hardcoding 127.0.0.1, enabling use behind reverse proxies
- Add README security warning about default encryption keys
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Poller publishes session end events via JetStream when SSH sessions
close (normal disconnect or idle timeout). Backend subscribes with a
durable consumer and writes ssh_session_end audit log entries with
duration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Gap 1: Add tenant ID verification after device lookup in SSH relay handleSSH,
closing cross-tenant token reuse vulnerability
- Gap 2: Add X-Forwarded-For fallback (last entry) when X-Real-IP is absent in
SSH relay source IP extraction; import strings package
- Gap 3: Add @limiter.limit("10/minute") to POST /winbox-session and POST
/ssh-session using existing slowapi pattern from app.middleware.rate_limit
- Gap 4: Add TODO comment in open_ssh_session explaining that SSH session count
enforcement is at the poller level; no NATS subject exists yet for API-side
pre-check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TunnelManager, TunnelResponder, SSH relay server, and SSH relay HTTP
server to the poller startup sequence with env-configurable port ranges,
idle timeouts, and session limits. Extends graceful shutdown to cover the
HTTP server (5s context), tunnel manager, and SSH relay server via defer.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the SSH relay server (Task 2.1) that validates single-use
Redis tokens via GETDEL, dials SSH to the target device with PTY,
and bridges WebSocket binary/text frames to SSH stdin/stdout/stderr
with idle timeout and per-user/per-device session limits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Manager which orchestrates WinBox tunnel lifecycle: open,
close, idle cleanup, and status queries. Uses PortPool and Tunnel from
Tasks 1.2/1.3. DeviceStore and CredentialCache wired in for Task 1.5.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Tunnel type that listens on a local port, accepts WinBox client
connections, dials the remote RouterOS device, and proxies traffic
bidirectionally. Uses activityReader to atomically update LastActive on
each read for idle timeout detection. Per-connection contexts derive from
the tunnel context so Close() terminates all connections cleanly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements PortPool with mutex-protected allocation, bind verification
to skip ports already in use by the OS, and release-for-reuse semantics.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- poller/docker-entrypoint.sh: convert from CRLF+BOM to LF (UTF-8 no BOM)
Windows saved the file with a UTF-8 BOM which made the Linux kernel
reject the shebang with 'exec format error', crashing the poller.
- infrastructure/openbao/init.sh: same CRLF -> LF fix
- poller/Dockerfile: add sed to strip CRLF and BOM at image build time
as a defensive measure for future Windows edits
- docker-compose.override.yml: add 'restart: on-failure' to api and poller
so they recover from the postgres startup race (TimescaleDB restarts
postgres after initdb, briefly causing connection refused on first boot)
- .gitattributes: enforce LF for all text/script/code files so git
normalises line endings on checkout and prevents this class of bug