Files
the-other-dude/.planning/ROADMAP.md
2026-03-12 23:47:30 -05:00

11 KiB

Roadmap: RouterOS Config Backup & Change Tracking (v9.6)

Overview

This roadmap delivers automated RouterOS configuration backup and change tracking as a new feature within the existing TOD platform. Work flows from database schema through the Go poller (collection), Python backend (storage, diffing, API), and React frontend (timeline, diff viewer, download). Each phase delivers a verifiable layer that the next phase builds on, culminating in a complete config history workflow with retention management and audit logging.

Phases

Phase Numbering:

  • Integer phases (1, 2, 3): Planned milestone work
  • Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)

Decimal phases appear between their surrounding integers in numeric order.

  • Phase 1: Database Schema - Config snapshot, diff, and change tables with encryption and RLS (completed 2026-03-13)
  • Phase 2: Poller Config Collection - SSH export, normalization, and NATS publishing from Go poller (completed 2026-03-13)
  • Phase 3: Snapshot Ingestion - Backend NATS subscriber stores snapshots with SHA256 deduplication
  • Phase 4: Manual Backup Trigger - API endpoint for on-demand config backup via poller (completed 2026-03-13)
  • Phase 5: Diff Engine - Unified diff generation and structured change parsing (completed 2026-03-13)
  • Phase 6: History API - REST endpoints for timeline, snapshot view, and diff retrieval with RBAC (completed 2026-03-13)
  • Phase 7: Config History UI - Timeline section on device page with change summaries (completed 2026-03-13)
  • Phase 8: Diff Viewer & Download - Unified diff display with syntax highlighting and .rsc download
  • Phase 9: Retention & Cleanup - 90-day retention policy with automatic snapshot deletion (completed 2026-03-13)
  • Phase 10: Audit & Observability - Audit event logging for all config backup operations (completed 2026-03-13)

Phase Details

Phase 1: Database Schema

Goal: Database tables exist to store config snapshots, diffs, and parsed changes with proper multi-tenant isolation and encryption Depends on: Nothing (first phase) Requirements: STOR-01, STOR-05 Success Criteria (what must be TRUE):

  1. Alembic migration creates router_config_snapshots, router_config_diffs, and router_config_changes tables
  2. All tables include tenant_id with RLS policies enforcing tenant isolation
  3. Snapshot config_text column is encrypted at rest (field-level encryption via existing credential pattern)
  4. SQLAlchemy models exist and can be imported by services Plans: 1 plan

Plans:

  • 01-01-PLAN.md — Alembic migration and SQLAlchemy models for config backup tables

Phase 2: Poller Config Collection

Goal: Go poller periodically connects to RouterOS devices via SSH, exports config, normalizes output, and publishes to NATS Depends on: Phase 1 Requirements: COLL-01, COLL-02, COLL-03, COLL-05, COLL-06 Success Criteria (what must be TRUE):

  1. Poller runs /export show-sensitive via SSH on each RouterOS device at a configurable interval (default 6h)
  2. Config output is normalized (timestamps stripped, whitespace trimmed, line endings unified) before publishing
  3. Poller publishes config snapshot payload to NATS subject config.snapshot.create with device_id and tenant_id
  4. Unreachable devices log a warning and are retried on the next interval without blocking other devices
  5. Interval is configurable via CONFIG_BACKUP_INTERVAL environment variable Plans: 2 plans

Plans:

  • 02-01-PLAN.md — SSH executor, config normalizer, env vars, NATS event type, device model extensions, Alembic migration
  • 02-02-PLAN.md — Backup scheduler with per-device goroutines, concurrency control, retry logic, and main.go wiring

Phase 3: Snapshot Ingestion

Goal: Backend receives config snapshots from NATS, encrypts via Transit, deduplicates by SHA256, and stores new snapshots Depends on: Phase 1, Phase 2 Requirements: STOR-02 Success Criteria (what must be TRUE):

  1. Backend NATS subscriber consumes config.snapshot.create messages and persists snapshots to router_config_snapshots
  2. When a snapshot has the same SHA256 hash as the device's most recent snapshot, it is skipped (no new row, no diff)
  3. Each stored snapshot includes device_id, tenant_id, config_text (encrypted), sha256_hash, and collected_at timestamp Plans: 1 plan

Plans:

  • 03-01-PLAN.md — NATS subscriber for config snapshot ingestion with dedup, encryption, and main.py wiring

Phase 4: Manual Backup Trigger

Goal: Operators can trigger an immediate config backup for a specific device through the API Depends on: Phase 2, Phase 3 Requirements: COLL-04 Success Criteria (what must be TRUE):

  1. POST /api/tenants/{tenant_id}/devices/{device_id}/backup triggers an immediate config collection for the specified device
  2. The triggered backup flows through the same collection and ingestion pipeline as scheduled backups
  3. Endpoint requires operator role or higher (viewers cannot trigger) Plans: 1 plan

Plans:

  • 04-01-PLAN.md — Go BackupResponder (NATS request-reply) + Python API trigger endpoint

Phase 5: Diff Engine

Goal: When a new (non-duplicate) snapshot is stored, the system generates a unified diff against the previous snapshot and parses structured changes Depends on: Phase 3 Requirements: DIFF-01, DIFF-02, DIFF-03, DIFF-04 Success Criteria (what must be TRUE):

  1. Unified diff is generated between consecutive snapshots when config content differs
  2. Diff is stored in router_config_diffs linking the two snapshot IDs
  3. Structured change parser extracts component name, human-readable summary, and raw diff line for each change
  4. Parsed changes are stored in router_config_changes as JSON-structured records Plans: 2 plans

Plans:

  • 05-01-PLAN.md — Unified diff generation service with Transit decrypt and subscriber integration
  • 05-02-PLAN.md — Structured change parser extracting components and summaries from diffs

Phase 6: History API

Goal: Frontend can query config change timeline, retrieve full snapshots, and view diffs through RBAC-protected endpoints Depends on: Phase 5 Requirements: API-01, API-02, API-03, API-04 Success Criteria (what must be TRUE):

  1. GET /api/tenants/{tid}/devices/{did}/config-history returns paginated change timeline with component, summary, and timestamp
  2. GET /api/tenants/{tid}/devices/{did}/config/{snapshot_id} returns full snapshot content
  3. GET /api/tenants/{tid}/devices/{did}/config/{snapshot_id}/diff returns unified diff text
  4. All endpoints enforce RBAC: viewer+ can read history, operator+ required for backup trigger
  5. Endpoints return proper 404 for nonexistent snapshots and 403 for unauthorized access Plans: 2 plans

Plans:

  • 06-01-PLAN.md — Config history timeline endpoint with service, router, and tests
  • 06-02-PLAN.md — Snapshot view and diff retrieval endpoints with Transit decrypt and RBAC

Phase 7: Config History UI

Goal: Device detail page displays a Configuration History section showing a timeline of config changes Depends on: Phase 6 Requirements: UI-01, UI-02 Success Criteria (what must be TRUE):

  1. Device detail page shows a "Configuration History" section below the Remote Access section
  2. Timeline displays change entries with component badge, summary text, and relative timestamp
  3. Timeline loads via TanStack Query and shows loading/empty states appropriately Plans: 1 plan

Plans:

  • 07-01-PLAN.md — API client, ConfigHistorySection component, and device detail page wiring

Phase 8: Diff Viewer & Download

Goal: Users can view unified diffs with syntax highlighting and download any snapshot as a .rsc file Depends on: Phase 7 Requirements: UI-03, UI-04 Success Criteria (what must be TRUE):

  1. Clicking a timeline entry opens a diff viewer showing unified diff with add (green) / remove (red) line highlighting
  2. User can download any snapshot as router-{device_name}-{timestamp}.rsc file
  3. Diff viewer handles large configs without performance degradation Plans: 2 plans

Plans:

  • 08-01-PLAN.md — Unified diff viewer component with syntax highlighting and clickable timeline entries
  • 08-02-PLAN.md — Snapshot download as .rsc file with download button on timeline entries

Phase 9: Retention & Cleanup

Goal: Snapshots older than the retention period are automatically cleaned up, keeping storage bounded Depends on: Phase 3 Requirements: STOR-03, STOR-04 Success Criteria (what must be TRUE):

  1. Snapshots older than 90 days (default) are automatically deleted along with their associated diffs and changes
  2. Retention period is configurable via CONFIG_RETENTION_DAYS environment variable
  3. Cleanup runs on a scheduled interval without blocking normal operations Plans: 1 plan

Plans:

  • 09-01-PLAN.md — Retention cleanup service with APScheduler, configurable retention period, and cascading deletion

Phase 10: Audit & Observability

Goal: All config backup operations are logged as audit events for compliance and troubleshooting Depends on: Phase 3, Phase 4, Phase 5 Requirements: OBS-01, OBS-02 Success Criteria (what must be TRUE):

  1. config_snapshot_created audit event logged when a new snapshot is stored
  2. config_snapshot_skipped_duplicate audit event logged when a duplicate snapshot is detected
  3. config_diff_generated audit event logged when a diff is created between snapshots
  4. config_backup_manual_trigger audit event logged when an operator triggers a manual backup Plans: 1 plan

Plans:

  • 10-01-PLAN.md — Audit event emission for all config backup operations

Progress

Execution Order: Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9 -> 10 Note: Phase 9 depends only on Phase 3 and Phase 10 depends on Phases 3/4/5, so Phases 9 and 10 can execute in parallel with Phases 6-8 if desired.

Phase Plans Complete Status Completed
1. Database Schema 1/1 Complete 2026-03-13
2. Poller Config Collection 2/2 Complete 2026-03-13
3. Snapshot Ingestion 0/1 Not started -
4. Manual Backup Trigger 1/1 Complete 2026-03-13
5. Diff Engine 2/2 Complete 2026-03-13
6. History API 2/2 Complete 2026-03-13
7. Config History UI 1/1 Complete 2026-03-13
8. Diff Viewer & Download 1/2 In Progress
9. Retention & Cleanup 1/1 Complete 2026-03-13
10. Audit & Observability 1/1 Complete 2026-03-13