diff --git a/docs/README.md b/docs/README.md index 6990f6a..80ca903 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,6 +1,8 @@ # The Other Dude -**Fleet management for MikroTik RouterOS devices.** Built for MSPs who manage hundreds of routers across multiple tenants. Think "UniFi Controller, but for MikroTik." +**Fleet management platform for MikroTik RouterOS.** + +Monitor routers, detect configuration drift, manage backups, and safely push configuration changes across hundreds of devices. Built for MSPs and network engineers managing MikroTik fleets. The Other Dude is a self-hosted, multi-tenant platform that gives you centralized visibility, configuration management, real-time monitoring, and zero-knowledge security across your entire MikroTik fleet -- from a single pane of glass. @@ -8,6 +10,16 @@ The Other Dude is a self-hosted, multi-tenant platform that gives you centralize ## Features +### Highlights + +- **Router Fleet Monitoring** -- Real-time CPU, memory, disk, traffic, and wireless metrics across every device. Configurable alerts with email, Slack, and webhook notifications. +- **Configuration Drift Detection** -- Automated config snapshots with full version history and side-by-side diffs. Know when configs change and what changed. +- **Safe Configuration Pushes** -- Two-phase config push with automatic panic-revert. Push confidently to remote devices without risking lockouts. +- **Backup Management** -- Automated configuration backups on a schedule. One-click restore to any previous version. +- **Network Topology Visibility** -- Interactive topology map showing device interconnections and shared subnets. + +--- + ### Fleet - **Dashboard** -- At-a-glance fleet health with device counts, uptime sparklines, status breakdowns per organization, and an "APs Needing Attention" card highlighting wireless issues. diff --git a/docs/website/docs/manage-multiple-mikrotik-routers.html b/docs/website/docs/manage-multiple-mikrotik-routers.html new file mode 100644 index 0000000..8a2359c --- /dev/null +++ b/docs/website/docs/manage-multiple-mikrotik-routers.html @@ -0,0 +1,304 @@ + + + + + + Manage Multiple MikroTik Routers + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ Early Access — This software is in active development and testing. It is not yet ready for production use. +
+
+ + + + + +
+
+ ← Back to Docs + +

How to Manage Multiple MikroTik Routers from One Dashboard

+ +

The Problem

+ +

At five routers, WinBox tabs are manageable. You open a session per device, keep credentials in your head or a password manager, and move on. At fifty routers spread across multiple client sites, the tab approach breaks down: you're writing bash scripts to loop SSH connections, hoping your expect scripts handle timeouts gracefully, and maintaining a firmware spreadsheet that's perpetually out of date.

+ +

At two hundred routers or more, you're dealing with a different class of problem entirely. Pushing a firewall rule change means coordinating across dozens of simultaneous connections. A firmware upgrade cycle requires tracking which devices are on which RouterOS version, which hardware models support the target version, and which sites have maintenance windows. Onboarding a new client means manually provisioning credentials and confirming connectivity for every device they hand you. A single misconfiguration that bypasses your SSH loop silently fails and leaves one router out of compliance.

+ +

The operational cost compounds: credential sprawl, inconsistent configs across hardware generations, no audit trail for who changed what, and no alerting when a device goes offline at 3 AM except your phone at 3:05 AM.

+ +

Why MikroTik Lacks Fleet Management

+ +

MikroTik's tooling was designed for single-device administration. That's not a criticism — WinBox is genuinely good at what it does. But the design assumptions don't scale to fleet operations:

+ + + +

The result is that most MikroTik fleet operators end up with a collection of tribal knowledge, SSH scripts of varying quality, and a Nagios instance that tells them about outages after the fact.

+ +

What MSPs and Network Teams Need

+ +

A workable mikrotik router fleet management solution needs to address several distinct operational concerns:

+ + + +

How The Other Dude Manages MikroTik Fleets

+ +

The Other Dude was built specifically for this operational context. Here's what the platform provides:

+ +

Fleet dashboard. A single view shows device count, online/offline ratio, uptime sparklines, and per-device CPU and memory across your entire fleet. The device table is virtual-scrolled and handles hundreds of routers without performance degradation.

+ +

Batch configuration. Apply a config template to multiple devices simultaneously. Each device gets its own result — success, failure, or partial — and the operation is logged with a timestamp and the identity of who ran it. Config templates support variable substitution, so you can define a standard firewall ruleset once and apply it across sites with different IP ranges without editing the template for each device.

+ +

Bulk command execution. Run arbitrary RouterOS CLI commands across a device group. Useful for one-off queries (what's the uptime on all devices at site X?) or scripted changes that don't fit a template pattern.

+ +

Firmware tracking. The fleet view shows RouterOS version per device. You can filter to find devices below a target version and plan upgrade operations accordingly, accounting for hardware model compatibility.

+ +

Subnet scanner. Discover MikroTik devices on a given network segment. New devices show up in the scan results and can be added to the fleet directly.

+ +

Geographic map view. Devices can be assigned coordinates and viewed on a map, useful for ISPs and MSPs with geographically distributed deployments.

+ +

Multi-tenant support. Tenant data isolation is enforced at the PostgreSQL layer using Row-Level Security policies, not just filtered in the application. One tenant's devices, configs, and credentials are not accessible to another tenant regardless of application bugs.

+ +

RBAC. Four roles: super_admin, admin, operator, and viewer. Permissions are checked server-side on every request. Viewer accounts can monitor but cannot execute commands or push configs.

+ +

Credential encryption. All device credentials are encrypted at rest using AES-256-GCM. Keys are managed separately from the database.

+ +

Architecture Overview

+ +

For teams evaluating self-hosted options, the stack is straightforward:

+ + + +

There's no SaaS dependency, no phone-home requirement, and no cloud account needed. Everything runs on hardware you control.

+ + + +
+
+ + + + + + diff --git a/docs/website/docs/mikrotik-centralized-management.html b/docs/website/docs/mikrotik-centralized-management.html new file mode 100644 index 0000000..06f3dbe --- /dev/null +++ b/docs/website/docs/mikrotik-centralized-management.html @@ -0,0 +1,282 @@ + + + + + + MikroTik Centralized Management Platform + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ ← Back to Docs + +

Centralized Management for MikroTik Router Fleets

+ +

The Problem

+ +

Managing a handful of MikroTik routers is straightforward. You open WinBox, make your changes, and move on. But at some point — ten devices, fifty, a few hundred — that approach falls apart. Individual device management doesn't scale.

+ +

Without mikrotik centralized management, the same problems surface at every shop that runs RouterOS at any meaningful scale. Configuration inconsistencies accumulate across sites: one router is still running a deprecated DHCP pool, another has a firewall rule that was supposed to be temporary six months ago, a third has credentials that haven't rotated since a contractor left. Firmware versions drift across the fleet. Visibility disappears — you're not sure which devices are healthy until something breaks and a client calls.

+ +

The operational overhead compounds. Every change requires logging into each device individually. Rollbacks mean remembering what the previous state was, if you wrote it down at all. Audits are manual. Backups are either nonexistent or ad hoc shell scripts that may or may not have run recently.

+ +

The underlying issue is that device-by-device management doesn't give you a fleet. It gives you a collection of individual routers that happen to share a vendor. Centralized management is what turns that collection into something you can actually operate.

+ +

The Current MikroTik Management Landscape

+ +

The existing tools for routeros configuration management each cover part of the problem:

+ + + +

None of these options alone provides comprehensive mikrotik router fleet management across the full lifecycle: inventory, configuration, monitoring, backups, security, and operations in one place.

+ +

What Centralized Management Should Cover

+ +

A complete centralized management platform for MikroTik fleets needs to address six problem areas:

+ +
    +
  1. Visibility — A fleet-wide dashboard showing device inventory, health status, active alerts, bandwidth utilization, and wireless issues. You need to know the state of every device without logging into any of them.
  2. +
  3. Configuration — The ability to browse, edit, push, and track configuration changes across any device in the fleet. That includes safe push mechanisms with automatic rollback if a change breaks connectivity, template-based operations for applying consistent configs across groups of devices, and a change history that tells you what changed, when, and who made the change.
  4. +
  5. Monitoring — Real-time metrics with historical trending. Threshold-based alerts with configurable notification channels. The ability to suppress alerts during scheduled maintenance windows so on-call staff aren't paged for expected downtime.
  6. +
  7. Backups — Automated configuration snapshots that run without human intervention, a version timeline you can navigate, side-by-side diff views between any two snapshots, and a tested restore path. Backups that exist but can't be restored aren't backups.
  8. +
  9. Security — Encrypted credential storage, a complete audit trail of all management actions, role-based access control so operators can do their jobs without needing admin privileges, and tenant isolation for MSPs managing multiple clients.
  10. +
  11. Operations — Firmware version tracking and upgrade management, bulk command execution, VPN management, and maintenance workflow support.
  12. +
+ +

How The Other Dude Provides Centralized MikroTik Management

+ +

The Other Dude is a self-hosted platform built specifically for MikroTik router fleet management. Here's what it covers across each area:

+ +

Fleet Visibility: The main dashboard surfaces device health, active alerts, bandwidth utilization, and wireless issues at a glance. The device table uses virtual scrolling to handle hundreds of devices without performance degradation. A geographic map and a network topology view (built on ReactFlow with Dagre layout) give you spatial and logical context for the fleet. A built-in subnet scanner handles device discovery when you're onboarding a new site.

+ +

Configuration Management: The config editor exposes the full RouterOS path hierarchy, letting you navigate and edit any config section the same way WinBox does — but with fleet-wide reach. Config pushes use a two-phase process: the change is applied, a connectivity check runs, and if the check fails the router automatically reverts to its previous state. This eliminates the risk of locking yourself out with a bad firewall rule or routing change. Simple Config mode provides a streamlined UI for common tasks — IP addressing, DHCP, firewall basics — modeled after consumer router interfaces for operators who don't need full RouterOS syntax exposure. Templates support variable substitution for batch operations across groups of devices.

+ +

Monitoring and Alerts: Health, traffic, and wireless metrics are stored in TimescaleDB hypertables, which handle high-frequency time-series data efficiently without requiring separate infrastructure. Alerts are threshold-based and configurable per device or device group. Real-time updates push via SSE backed by NATS JetStream. Maintenance windows suppress alerts for scheduled work so you're not managing noise during planned outages.

+ +

Security: Authentication uses SRP-6a — a zero-knowledge protocol where the server never sees your password. Device credentials are encrypted at rest using AES-256-GCM with per-tenant envelope encryption via OpenBao Transit. Role-based access control supports four roles (super_admin, admin, operator, viewer) with appropriate permission boundaries at each level. PostgreSQL Row-Level Security enforces tenant isolation at the database layer — one tenant's data is never accessible to another's, regardless of application-layer logic. Every management action is recorded in an immutable audit trail.

+ +

Self-Hosted: The entire platform deploys via Docker Compose. Your device credentials, configuration history, and monitoring data stay on your infrastructure. The platform is open source and available on GitHub.

+ +

Getting Started

+ +

Getting The Other Dude running against your first MikroTik device takes about ten minutes. Clone the repository, run setup.py to walk through the initial configuration, point the platform at your first router's IP, and you'll have a connected device with monitoring active and the first config backup in the timeline. Full setup instructions, including Docker Compose prerequisites and initial credential configuration, are in the Quick Start guide.

+ + + +
+
+ + + + + + diff --git a/docs/website/docs/mikrotik-configuration-drift.html b/docs/website/docs/mikrotik-configuration-drift.html new file mode 100644 index 0000000..30255b8 --- /dev/null +++ b/docs/website/docs/mikrotik-configuration-drift.html @@ -0,0 +1,173 @@ + + + + + + MikroTik Configuration Drift Detection + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +

How to Detect Configuration Drift in MikroTik Routers

+ +

Configuration drift is one of the quieter failure modes in network management. Routers that were identical at deployment gradually diverge — through manual fixes, firmware upgrades, emergency changes, and accumulated tweaks. This article explains why drift happens specifically with RouterOS, why it is difficult to detect, and what an effective solution looks like in practice.

+ +

The Problem

+ +

Configuration drift describes the gap between the intended state of a device and its actual running configuration. For a single router this is manageable. Across a fleet of dozens or hundreds of MikroTik devices, it becomes a real operational hazard.

+ +

The pattern is familiar: an engineer connects via WinBox to resolve an outage, adds a static route or adjusts a firewall rule, and moves on. The fix never makes it into documentation or a change ticket. Later, a firmware upgrade silently adds new default values. Someone else modifies the same firewall rule "temporarily" during a maintenance window and forgets to revert it.

+ +

After six months, the device is running something no one fully understands. If the hardware fails, reproducing that config from scratch is guesswork.

+ +

Why RouterOS Makes This Hard

+ +

RouterOS does not include a native mechanism for tracking configuration changes or comparing configs across devices. This is not a complaint — it is just a fact of how the platform is designed, and it matters when you are trying to build operational processes around it.

+ +

A few specific pain points:

+ + + +

Common Workarounds

+ +

Engineers who have hit this problem have developed several approaches, each with real limitations.

+ +

Scheduled /export to FTP or SFTP. This is the most common approach and it does produce periodic snapshots. The problem is what happens next: text dumps pile up in a directory, and comparing them requires either manual inspection or custom scripting. When a device exports 800 lines of config, spotting a single changed firewall rule by eye is unreliable.

+ +

The Dude. MikroTik's own monitoring tool tracks device health and topology well. It does not track configuration changes. It will tell you a router is up; it will not tell you its firewall rules changed overnight.

+ +

Custom diff scripts. Some teams build shell scripts that pull exports, normalize whitespace, strip firmware-version noise, and run diff. This can work, but these scripts are fragile. They break on RouterOS upgrades, fail silently when a device is unreachable, and tend to accumulate exceptions and special cases until the person who wrote them is the only one who understands them.

+ +

Spreadsheets. For small deployments, a spreadsheet tracking what each site should have configured is better than nothing. It does not scale, and it is only as accurate as the last time someone updated it.

+ +

What a Proper Solution Requires

+ +

Solving configuration drift effectively requires a few things working together.

+ +

First, automated, periodic snapshots from every device. Manual processes do not hold up — the snapshot needs to happen whether or not an engineer remembers to trigger it. The interval should be configurable; some environments need hourly snapshots, others daily.

+ +

Second, version history with diff visibility. Storing snapshots is only useful if you can compare them. You need to be able to see exactly what changed between two points in time — not just that something changed, but which lines were added, removed, or modified. A side-by-side diff view makes this fast to review.

+ +

Third, alerts when configs change unexpectedly. Drift you don't know about is the dangerous kind. An alert when a device's config changes between polling cycles lets you investigate before that change causes a problem, rather than after.

+ +

Fourth, an audit trail tied to user actions. When a config change comes from a push made through your management platform, you want to know which user initiated it, when, and what it contained. This is separate from detecting drift caused by out-of-band changes — you need both.

+ +

How The Other Dude Handles Configuration Drift

+ +

The Other Dude polls RouterOS devices on a configurable interval using the RouterOS binary API (port 8729, TLS). On each poll cycle it retrieves the full running configuration and stores it in PostgreSQL alongside a complete version history. Every stored snapshot is compared to the previous one; if anything changed, the difference is recorded.

+ +

The web UI includes a side-by-side diff viewer. You can select any two snapshots for a device — or compare two different devices — and see exactly which lines differ. This makes it straightforward to answer questions like "what changed on this router between Tuesday and Thursday" or "why does this branch site have different firewall rules than the others."

+ +

Config changes pushed through the platform are recorded in an audit trail with full user attribution. If someone pushes a new firewall ruleset or modifies an interface address, that action is logged with the user, timestamp, and the exact config diff applied. Out-of-band changes made directly via WinBox or SSH will show up in the next polling cycle as an unexpected diff.

+ +

For safe config pushes, The Other Dude uses a two-phase approach: changes are applied to the device, and the platform waits for a confirmation that the device is still reachable. If the device goes silent after the change — which can happen if a firewall or routing change cuts off the management path — the platform automatically reverts to the previous config. This significantly reduces the risk of locking yourself out of a remote device.

+ +

For fleet-scale work, the platform supports config templates with variable substitution. You can define a template for a class of site (branch office, retail location, distribution hub) and push it across a batch of devices with per-device values filled in. This makes it easier to maintain consistency across similar sites and to identify which devices have diverged from that common baseline. To be clear: the current implementation detects config changes between snapshots. Full desired-state compliance checking — where the system continuously validates each device against a canonical template and flags deviations — is not yet implemented, but the snapshot and diff infrastructure is designed to support it.

+ +

Related Guides

+ + + +

View on GitHub

+ +
+
+ + + + + + diff --git a/docs/website/docs/mikrotik-router-backup-automation.html b/docs/website/docs/mikrotik-router-backup-automation.html new file mode 100644 index 0000000..eeecea0 --- /dev/null +++ b/docs/website/docs/mikrotik-router-backup-automation.html @@ -0,0 +1,290 @@ + + + + + + MikroTik Router Backup Automation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ ← Back to Docs + +

How to Automate MikroTik Router Backups

+ +

The Problem

+ +

Most MikroTik routers get configured once and then left alone. Over months and years, someone adds firewall rules, tweaks NAT entries, adjusts OSPF timers. Nobody writes any of it down. The config that's running in production is the only record of what was done.

+ +

Then something breaks. A firmware upgrade goes sideways and the router reboots into a partially migrated config. A new hire cleans up "unused" firewall rules and takes down a VPN tunnel. The compact flash in a RB2011 starts throwing read errors. In every one of these cases, the question is the same: what was the config before this happened?

+ +

The answer is usually bad. An export someone ran eight months ago sitting in a random directory. An FTP server that stopped receiving backups when the disk filled up six months ago and no one noticed until now. A Scheduler script that was working fine until someone changed the FTP password. MikroTik backup automation is one of those things that either works reliably or doesn't work at all — there's rarely a middle state where it's "mostly working."

+ +

At scale this gets worse fast. If you manage fifty routers, you might be on top of it. If you manage five hundred, manual backup processes will fail silently across a meaningful percentage of your fleet at any given time.

+ +

RouterOS Backup Methods

+ +

RouterOS has two built-in mechanisms for saving configuration, and both have real tradeoffs.

+ +

/system backup save produces a binary .backup file. It captures the full configuration including passwords and certificates. You can restore it with one command and come back to exactly the state the device was in. The catch: it's device-specific and version-dependent. Restoring a backup from RouterOS 6 to a device running RouterOS 7 will fail or produce unexpected results. You can't open the file in a text editor to see what's in it. You can't diff two backups to find what changed.

+ +

/export produces a human-readable text file containing the RouterOS commands needed to recreate the configuration. It's possible to partially import an export on a different device, and you can read it with any text editor. The tradeoffs: it doesn't include passwords or private keys, it omits settings that are at their default values, and the ordering of entries can vary between RouterOS versions. Two exports of the same config taken on different firmware versions may look different even if nothing changed.

+ +

Neither method runs automatically. To get scheduled backups, the traditional approach is a Scheduler + FTP or SFTP script on each device. This works, but it requires per-device configuration, a working FTP server, error handling for failed transfers, and some way to detect when backups stop arriving. In a fleet of hundreds of devices, that's a lot of moving parts.

+ +

What Goes Wrong Without Automated Backups

+ +

The consequences of not having reliable backup automation tend to be invisible until they aren't:

+ + + +

What a Backup System Should Do

+ +

A reliable mikrotik backup automation solution needs to address the failure modes of the manual approach:

+ + + +

How The Other Dude Automates Backups

+ +

The Other Dude handles mikrotik configuration management through a Go-based poller that connects to each device using the RouterOS binary API over TLS (port 8729). There are no per-device backup scripts, no FTP server to maintain, and no Scheduler entries to configure on each router.

+ +

The backup process works as follows:

+ + + +

Because the backup system is built on top of the same API connection the poller uses for everything else, there's no separate backup infrastructure to maintain. A device that's reachable for monitoring is reachable for backups.

+ +

Related Guides

+ + + +
+
+ + + + + + diff --git a/docs/website/docs/mikrotik-router-monitoring.html b/docs/website/docs/mikrotik-router-monitoring.html new file mode 100644 index 0000000..43cd4a0 --- /dev/null +++ b/docs/website/docs/mikrotik-router-monitoring.html @@ -0,0 +1,304 @@ + + + + + + MikroTik Router Monitoring at Scale + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ ← Back to Docs +
MikroTik Monitoring — The Other Dude
+ +

How to Monitor MikroTik Routers at Scale

+ +

If you manage more than a handful of MikroTik routers, "monitoring" stops meaning "is this device pingable" and starts meaning something harder. You need to know which of your 200 routers is spiking CPU before a user files a ticket. You need to find the access point with degraded wireless signal before the site calls in. You need bandwidth utilization trends to make capacity decisions, not just point-in-time readings. And you need to know the moment a device goes offline at 2am — not when someone shows up for work.

+ +

That's what real mikrotik router monitoring looks like in production.

+ +

The Problem with MikroTik Monitoring at Scale

+ +

Individual devices are easy. RouterOS has good per-device tooling. The problem is the fleet. When you're managing dozens or hundreds of routers across multiple sites, you have no single place to answer questions like:

+ + + +

These are fleet-level questions. They require a centralized data store, consistent polling, and a UI that surfaces the signal instead of burying you in noise.

+ +

Native RouterOS Monitoring Options

+ +

RouterOS gives you several monitoring tools. Each has real limitations when applied at fleet scale.

+ + + +

What MikroTik Monitoring Software Should Include

+ +

A purpose-built mikrotik monitoring software solution should handle the full picture — not just availability pings.

+ + + +

How The Other Dude Monitors MikroTik Routers

+ +

The Other Dude was built specifically for MikroTik fleet management. The monitoring stack is not bolted on — it's the core of what the platform does.

+ +

Collection via the RouterOS binary API. The Go-based poller connects to each device over the RouterOS binary API on TLS port 8729. This is not SNMP. There are no OIDs, no MIB files, no polling configuration per metric type. The API returns structured data directly from RouterOS resources, which is faster, more reliable, and requires no per-device SNMP configuration.

+ +

Three metric families. Each poll cycle collects health metrics (CPU, memory, disk, temperature), interface metrics (per-interface traffic rates calculated from cumulative counter deltas), and wireless metrics (client count, signal strength in dBm, CCQ per wireless interface). All three are stored in TimescaleDB hypertables with automatic time-based bucketing for efficient range queries.

+ +

Real-time browser updates. Metrics flow from the poller into NATS JetStream, then out to connected browsers via Server-Sent Events. The dashboard reflects current device state without polling the database on every page load.

+ +

Fleet health dashboard. The main view shows aggregate fleet health — how many devices are online, which have active alerts, uptime sparklines per device, and bandwidth charts for the busiest links. The "APs Needing Attention" card surfaces wireless access points with degraded signal or low CCQ so you can find problems before users do.

+ +

Per-device detail. Each device has its own page with health graphs over configurable time windows, per-interface traffic charts, and wireless metrics broken down by interface. You can see exactly what a device was doing at any point in its history.

+ +

Alert rules with duration thresholds. Alert rules combine a metric, a threshold, and a duration_polls count. A rule for "CPU > 90%" with duration_polls = 5 only fires after five consecutive polling intervals above the threshold. This eliminates noise from transient spikes. New tenants receive a default set of alert rules covering CPU, memory, disk, offline detection, wireless signal, and CCQ — sensible baselines that you can tune without starting from zero.

+ +

Notification channels. Alerts are delivered via email, webhook, or Slack. Maintenance windows let you suppress alerts during planned work without disabling the rules themselves.

+ +

Network topology map. An interactive topology view shows device interconnections across your fleet, giving you a structural context for interpreting monitoring data.

+ +

Related Guides

+ + + +
+
+ + + + + +