WIRELESS_REGISTRATIONS stream had a 256MB MaxBytes cap in a 256MB
container — guaranteed to crash under load. ALERT_EVENTS and
OPERATION_EVENTS had no byte limit at all.
- Reduce WIRELESS_REGISTRATIONS MaxBytes from 256MB to 128MB
- Add 16MB MaxBytes cap to ALERT_EVENTS and OPERATION_EVENTS
- Bump NATS container memory limit from 256MB to 384MB
- Add restart: unless-stopped to NATS in base compose
- Bump version to 9.8.2
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Go: nil-safe profile cache in SNMPCollector, updated test assertion
- ESLint: fix conditional useQuery hook in SNMPMetricsSection
- ESLint: remove unused CREDENTIAL_TYPE_LABELS, ChevronDown/Right,
EmptyState import, advancedOpen state
- TypeScript: replace empty interface with type alias
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SNMP devices added without a profile (e.g., via simplified add flow)
were failing with "no SNMP profile assigned". Now falls back to the
generic-snmp profile which collects standard MIB-II metrics.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Build tod-mib-parser in both poller and API Dockerfiles
- Bundle 16 standard MIBs (IF-MIB, HOST-RESOURCES, SNMPv2, etc.)
- Pass --search-path /app/mibs to parser so dependencies resolve
- Users no longer need to upload standard MIBs manually
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add gosmi v1.0.4 dependency for MIB parsing
- Create poller/cmd/mib-parser/ with main.go and tree.go
- CLI accepts MIB file path and optional --search-path
- Outputs JSON OID tree with oid, name, description, type, access, status, children
- Errors output as JSON {"error":"..."} to stdout (exit 0) for Python backend
- Panic recovery wraps gosmi LoadModule for malformed MIBs
- Parent-child tree built from OID hierarchy with numeric sort
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SNMP collector was missing the per-device Redis lock that prevents
duplicate polls across pods. Rather than adding the lock to each
collector individually, lift it into runDeviceLoop so ALL collector
types (RouterOS and SNMP) are protected uniformly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Additive field with omitempty tag for SNMP device identification
- Existing RouterOS events produce identical JSON (field not set)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SNMPCollector implements poller.Collector interface
- Profile-driven OID collection: scalars via Get, tables via Walk/BulkWalk
- BulkWalk wrapped in withTimeout to prevent indefinite hangs
- SNMPv1 uses Walk, v2c/v3 uses BulkWalk (MaxRepetitions=10)
- Safety valve: walkTable aborts at 10,000 PDUs per walk
- Counter delta computation via CounterCache for rate metrics
- Standard metrics routed to DeviceMetricsEvent (interfaces, health)
- Custom metrics routed to SNMPMetricsEvent (snmp_custom)
- DeviceStatusEvent published with online/offline status
- Each poll group collects independently (partial failure tolerant)
- Credential resolution via GetRawCredentials + ParseSNMPCredentials
- ifXTable Counter64 data supersedes ifTable Counter32 via PreferOver
- ssCpuIdle invert_percent transform for CPU load fallback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- NATS request-reply on device.discover.snmp with queue group discover-workers
- Probes sysObjectID.0, sysDescr.0, sysName.0 via SNMP GET
- Credentials from request payload (not database), never logged
- 5-second timeout on both connect and GET operations
- Supports SNMP v1, v2c, and v3 with all auth/priv protocols
- Builds gosnmp client inline to avoid snmp->bus import cycle
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Subscribe/unsubscribe lifecycle
- Invalid JSON returns error response
- Missing ip_address returns descriptive error
- Response JSON field names match spec
- Invalid SNMP version rejected
- Default port 161 when zero
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Implement computeCounterDelta for Counter32/Counter64 with wraparound handling
- Sanity threshold discards deltas > 90% of max value (device reset detection)
- CounterCache uses Redis MGET/MSET pipelining for efficient state persistence
- Counter keys use "snmp:counter:{device_id}:{oid}" format with 600s TTL
- Add SNMPMetricsEvent and SNMPMetricEntry structs to bus package
- Add PublishSNMPMetrics publishing to "device.metrics.snmp_custom.{device_id}"
- Full test coverage: 10 counter tests including miniredis integration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Install gosnmp v1.43.2 as direct dependency
- Create snmp package with SNMPConfig, CompiledProfile, PollGroup types
- Implement BuildSNMPClient for v1, v2c, and v3 (all security levels)
- Map auth protocols (MD5, SHA, SHA224-512) and priv protocols (DES, AES128-256)
- MaxRepetitions set to 10 (not gosnmp default 50) for embedded devices
- Full test coverage: 9 tests covering all SNMP versions and protocol mappings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add collectors map[string]Collector field to Scheduler struct
- Register RouterOSCollector for "routeros" inside NewScheduler
- Replace direct PollDevice call with collector dispatch by dev.DeviceType
- Default empty DeviceType to "routeros" for backward compatibility
- Log error and exit device loop for unknown device types
- Circuit breaker logic unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GetRawCredentials resolves credentials: device transit, device legacy, profile transit, profile legacy
- Cache key includes source (device/profile) to prevent cross-source poisoning
- GetCredentials is now a backward-compatible wrapper calling GetRawCredentials + ParseRouterOSCredentials
- Add DecryptRaw to device package for raw byte decryption without JSON parsing
- Invalidate clears both parsed and raw cache entries
- All existing callers (PollDevice, CmdResponder, TunnelResponder, BackupResponder, SSHRelay) unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SNMPCredential struct with v1/v2c/v3 field support
- ParseRouterOSCredentials handles typed and legacy no-type-field JSON
- ParseSNMPCredentials handles snmp_v1, snmp_v2c, snmp_v3 types
- credentialEnvelope for type-agnostic type field peeking
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Assert DeviceType defaults to "routeros" via COALESCE
- Assert SNMPPort defaults to 161 via COALESCE
- Assert SNMPVersion, SNMPProfileID, CredentialProfileID are nil for
existing RouterOS devices without profile links
- Assert ProfileEncryptedCredentials and ProfileEncryptedCredentialsTransit
are nil when no credential profile is linked
- Update test schema with device_type, snmp_port, snmp_version,
snmp_profile_id, credential_profile_id columns
- Add credential_profiles table to test schema for LEFT JOIN
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 7 new fields to store.Device: DeviceType, SNMPPort, SNMPVersion,
SNMPProfileID, CredentialProfileID, ProfileEncryptedCredentials,
ProfileEncryptedCredentialsTransit
- Update FetchDevices query with LEFT JOIN credential_profiles and
expanded WHERE clause (credential_profile_id IS NOT NULL)
- Update GetDevice query with same JOIN and new columns
- COALESCE defaults: device_type='routeros', snmp_port=161
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DeviceInterfaceEvent type publishes to device.interfaces.{device_id}
- PublishDeviceInterfaces method follows existing publisher pattern
- DEVICE_EVENTS stream includes device.interfaces.> subject
- PollDevice collects interface info after traffic counters, before health
- Non-fatal errors with Prometheus metrics for publish success/failure
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- InterfaceInfo struct field compilation test
- MAC address lowercasing test
- Running bool parsing test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- RFMonitorStats struct for per-interface RF data (noise floor, channel width, TX power)
- CollectRFMonitor with v6/v7 RouterOS version routing
- WIRELESS_REGISTRATIONS NATS stream with 30-day retention (separate from DEVICE_EVENTS)
- WirelessRegistrationEvent type and PublishWirelessRegistrations method
- Poll cycle collects per-client registrations and RF stats, publishes combined event
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- RegistrationEntry struct for per-client wireless data (MAC, signal, CCQ, rates, distance)
- ParseSignalStrength handles all RouterOS format variations (-67, -67@5GHz, -67@HT40)
- CollectRegistrations with v6/v7 RouterOS version routing
- Unit tests for ParseSignalStrength covering 10 cases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without a byte limit, the stream grows unbounded within its 24h
max_age window. At 101 devices polling every 60s, it hits 128MB
in ~10 hours and OOMs the NATS container.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add error checking to all three write calls in bridge.go. A write
failure now terminates that goroutine's copy loop and triggers cancel,
which lets the other goroutines clean up naturally.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove 7 no-op exported stubs from toast.tsx (ToastProvider, ToastViewport,
Toast, ToastTitle, ToastDescription, ToastClose, useToasts) — nothing imports them
- Remove fwFailKey variable and its Set() call from worker.go — the
firmware:check-failed Redis key was never read anywhere
- Remove unused deviceStore and credCache fields from tunnel.Manager struct
and drop corresponding parameters from NewManager(); update call site in
main.go and all test usages
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- migration 002: use current_database() instead of hardcoded 'tod'
- ci.yml: use Go 1.25 (required by nats-server dep), mark golangci-lint
as continue-on-error until it supports Go 1.25
- go.mod: keep at 1.25.0 (nats-server v2.12.5 requires it)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create BackupResponder for NATS request-reply on config.backup.trigger
- Extract public CollectAndPublish from BackupScheduler returning sha256 hash
- Define BackupExecutor/BackupLocker/DeviceGetter interfaces for testability
- Create RedisBackupLocker adapter wrapping redislock.Client
- Wire BackupResponder into main.go lifecycle
- All 6 tests pass with in-process NATS server
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test subscribe registers subscription
- Test valid request returns success with sha256_hash
- Test lock held returns locked status
- Test invalid JSON returns error
- Test Stop unsubscribes cleanly
- Test device not found returns failed status
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create BackupScheduler with all dependencies injected
- Run as goroutine parallel to status poll scheduler
- Shares same context for graceful shutdown via SIGINT/SIGTERM
- Startup logged with interval, max_concurrent, command_timeout
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- BackupScheduler manages per-device backup goroutines independently from status poll
- First backup uses 30-300s random jitter delay to spread load
- Concurrency limited by buffered channel semaphore (configurable max)
- Per-device Redis lock prevents duplicate backups across pods
- Auth failures and host key mismatches block retries with clear warnings
- Transient errors use 5m/15m/1h exponential backoff with cap
- Offline devices skipped via Redis status key check
- TOFU fingerprint stored on first successful SSH connection
- Config output validated, normalized, hashed, published to NATS
- SSHHostKeyUpdater interface added to interfaces.go
- All 12 backup unit tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bind tunnel listeners to 0.0.0.0 instead of 127.0.0.1 so tunnels
are reachable through reverse proxies and container networks
- Reduce port range to 49000-49004 (5 concurrent tunnels)
- Derive WinBox URI host from request Host header instead of
hardcoding 127.0.0.1, enabling use behind reverse proxies
- Add README security warning about default encryption keys
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Poller publishes session end events via JetStream when SSH sessions
close (normal disconnect or idle timeout). Backend subscribes with a
durable consumer and writes ssh_session_end audit log entries with
duration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Gap 1: Add tenant ID verification after device lookup in SSH relay handleSSH,
closing cross-tenant token reuse vulnerability
- Gap 2: Add X-Forwarded-For fallback (last entry) when X-Real-IP is absent in
SSH relay source IP extraction; import strings package
- Gap 3: Add @limiter.limit("10/minute") to POST /winbox-session and POST
/ssh-session using existing slowapi pattern from app.middleware.rate_limit
- Gap 4: Add TODO comment in open_ssh_session explaining that SSH session count
enforcement is at the poller level; no NATS subject exists yet for API-side
pre-check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TunnelManager, TunnelResponder, SSH relay server, and SSH relay HTTP
server to the poller startup sequence with env-configurable port ranges,
idle timeouts, and session limits. Extends graceful shutdown to cover the
HTTP server (5s context), tunnel manager, and SSH relay server via defer.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the SSH relay server (Task 2.1) that validates single-use
Redis tokens via GETDEL, dials SSH to the target device with PTY,
and bridges WebSocket binary/text frames to SSH stdin/stdout/stderr
with idle timeout and per-user/per-device session limits.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>