13 KiB
Per-Tenant VPN Network Isolation — Design Spec
Overview
Isolate WireGuard VPN networks per tenant so that devices in one tenant's VPN cannot reach devices in another tenant's VPN. Each tenant gets a unique /24 subnet auto-allocated from 10.10.0.0/16, with iptables rules blocking cross-subnet traffic.
Branch: main (this is a security fix, not SaaS-specific)
Design Decisions
- Single
wg0interface — WireGuard handles thousands of peers on one interface with negligible performance impact. No need for per-tenant interfaces. - Per-tenant
/24subnets — allocated from10.10.0.0/16, giving 255 tenants (index 1–255). Index 0 is reserved. Expandable to10.0.0.0/8if needed (note:_next_available_ip()materializes all hosts in the subnet, so subnets larger than/24require refactoring that function). - Auto-allocation only —
setup_vpn()picks the next available subnet. No manual override. - Global config sync — one
wg0.confwith all tenants' peers. Rebuilt on any VPN change. Protected by a PostgreSQL advisory lock to prevent concurrent writes. - Global server keypair — a single WireGuard server keypair stored in
system_settings, replacing per-tenant server keys. Generated on firstsetup_vpn()call or during migration. - iptables isolation — cross-subnet traffic blocked at the WireGuard container's firewall. IPv6 blocked too.
- Device-side config is untrusted — isolation relies entirely on server-side enforcement (AllowedIPs
/32+ iptables DROP). A malicious device operator changing theirallowed-addressto10.10.0.0/16on their router gains nothing — the server only routes their assigned/32.
Data Model Changes
Modified: vpn_config
| Column | Change | Description |
|---|---|---|
subnet_index |
New column, integer, unique, not null | Maps to third octet: index 1 = 10.10.1.0/24 |
subnet |
Default changes | No longer 10.10.0.0/24; derived from subnet_index |
server_address |
Default changes | No longer 10.10.0.1/24; derived as 10.10.{index}.1/24 |
server_private_key |
Deprecated | Kept in table for rollback safety but no longer used. Global key in system_settings is authoritative. |
server_public_key |
Deprecated | Same — kept but unused. All peers use the global public key. |
New: system_settings entries
| Key | Description |
|---|---|
vpn_server_private_key |
Global WireGuard server private key (encrypted with CREDENTIAL_ENCRYPTION_KEY) |
vpn_server_public_key |
Global WireGuard server public key (plaintext) |
Allocation Logic
subnet_index = first available integer in range [1, 255] not already in vpn_config
subnet = 10.10.{subnet_index}.0/24
server_address = 10.10.{subnet_index}.1/24
Allocation query (atomic, gap-filling):
SELECT MIN(x) FROM generate_series(1, 255) AS x
WHERE x NOT IN (SELECT subnet_index FROM vpn_config)
If no index available → 422 "VPN subnet pool exhausted".
Unique constraint on subnet_index provides safety against race conditions. On conflict, retry once.
VPN Service Changes
setup_vpn(db, tenant_id, endpoint)
Current behavior: creates VpnConfig with hardcoded 10.10.0.0/24 and generates a per-tenant server keypair.
New behavior:
- Get or create global server keypair: check
system_settingsforvpn_server_private_key. If not found, generate a new keypair and store both the private key (encrypted) and public key. This happens on the firstsetup_vpn()call on a fresh install. - Allocate next
subnet_indexusing the gap-filling query - Set
subnet = 10.10.{index}.0/24 - Set
server_address = 10.10.{index}.1/24 - Store the global public key in
server_public_key(for backward compat / display) - Call
sync_wireguard_config(db)(global, not per-tenant)
sync_wireguard_config(db)
Current signature: sync_wireguard_config(db, tenant_id) — builds config for one tenant.
New signature: sync_wireguard_config(db) — builds config for ALL tenants.
Concurrency protection: acquire a PostgreSQL advisory lock (pg_advisory_xact_lock(hash)) before writing. This prevents two simultaneous peer additions from producing a corrupt wg0.conf.
Atomic write: write to a temp file, then os.rename() to wg0.conf. This prevents the WireGuard container from reading a partially-written file.
New behavior:
- Acquire advisory lock
- Read global server private key from
system_settings(decrypt it) - Query ALL enabled
VpnConfigrows (across all tenants, using admin engine to bypass RLS) - For each, query enabled
VpnPeerrows - Build single
wg0.conf:
[Interface]
Address = 10.10.0.1/16
ListenPort = 51820
PrivateKey = {global_server_private_key}
# --- Tenant: {tenant_name} (10.10.1.0/24) ---
[Peer]
PublicKey = {peer_public_key}
PresharedKey = {preshared_key}
AllowedIPs = 10.10.1.2/32
# --- Tenant: {tenant_name_2} (10.10.2.0/24) ---
[Peer]
PublicKey = {peer_public_key}
PresharedKey = {preshared_key}
AllowedIPs = 10.10.2.2/32
- Write to temp file,
os.rename()towg0.conf - Touch
.reloadflag - Release advisory lock
_next_available_ip(db, tenant_id, config)
No changes needed — already scoped to tenant_id and uses the config's subnet. With unique subnets per tenant, IPs are naturally isolated. Note: this function materializes all /24 hosts into a list, which is fine for /24 (253 entries) but must be refactored if subnets larger than /24 are ever used.
add_peer(db, tenant_id, device_id, ...)
Changes:
- Calls
sync_wireguard_config(db)instead ofsync_wireguard_config(db, tenant_id) - Validate
additional_allowed_ips: if provided, reject any subnet that overlaps with10.10.0.0/16(the VPN address space). Only non-VPN subnets are allowed (e.g.,192.168.1.0/24for site-to-site routing). This prevents a tenant from claiming another tenant's VPN subnet in their AllowedIPs.
remove_peer(db, tenant_id, peer_id)
Minor change: calls sync_wireguard_config(db) instead of sync_wireguard_config(db, tenant_id).
Tenant deletion hook
When a tenant is deleted (CASCADE deletes vpn_config and vpn_peers), call sync_wireguard_config(db) to regenerate wg0.conf without the deleted tenant's peers. Add this to the tenant deletion endpoint.
read_wg_status()
No changes — status is keyed by peer public key, which is unique globally. The existing get_peer_handshake() lookup continues to work.
WireGuard Container Changes
iptables Isolation Rules
Update docker-data/wireguard/custom-cont-init.d/10-forwarding.sh:
#!/bin/sh
# Enable forwarding between Docker network and WireGuard tunnel
# Idempotent: check before adding to prevent duplicates on restart
iptables -C FORWARD -i eth0 -o wg0 -j ACCEPT 2>/dev/null || iptables -A FORWARD -i eth0 -o wg0 -j ACCEPT
iptables -C FORWARD -i wg0 -o eth0 -j ACCEPT 2>/dev/null || iptables -A FORWARD -i wg0 -o eth0 -j ACCEPT
# Block cross-subnet traffic on wg0 (tenant isolation)
# Peers in 10.10.1.0/24 cannot reach peers in 10.10.2.0/24
iptables -C FORWARD -i wg0 -o wg0 -j DROP 2>/dev/null || iptables -A FORWARD -i wg0 -o wg0 -j DROP
# Block IPv6 forwarding on wg0 (prevent link-local bypass)
ip6tables -C FORWARD -i wg0 -j DROP 2>/dev/null || ip6tables -A FORWARD -i wg0 -j DROP
# NAT for return traffic
iptables -C POSTROUTING -t nat -o wg0 -j MASQUERADE 2>/dev/null || iptables -t nat -A POSTROUTING -o wg0 -j MASQUERADE
echo "WireGuard forwarding and tenant isolation rules applied"
Rules use iptables -C (check) before -A (append) to be idempotent across container restarts.
The key isolation layers:
- WireGuard AllowedIPs — each peer can only send to its own
/32IP (cryptographic enforcement) - iptables
wg0 → wg0DROP — blocks any traffic that enters and exits the tunnel interface (peer-to-peer) - iptables IPv6 DROP — prevents link-local IPv6 bypass
- Separate subnets — no IP collisions between tenants
additional_allowed_ipsvalidation — blocks tenants from claiming VPN address space
Server Address
The [Interface] Address changes from 10.10.0.1/24 to 10.10.0.1/16 so the server can route to all tenant subnets.
Routing Changes
Poller & API
No changes needed. Both already route 10.10.0.0/16 via the WireGuard container.
setup.py
Update prepare_data_dirs() to write the updated forwarding script with idempotent rules and IPv6 blocking.
RouterOS Command Generation
onboard_device() and get_peer_config()
These generate RouterOS commands for device setup. Changes:
allowed-addresschanges from10.10.0.0/24to10.10.{index}.0/24(tenant's specific subnet)endpoint-addressandendpoint-portunchanged- Server public key changes to the global server public key (read from
system_settings)
Migration
Database Migration
- Generate global server keypair:
- Create keypair using
generate_wireguard_keypair() - Store in
system_settings:vpn_server_private_key(encrypted),vpn_server_public_key(plaintext)
- Create keypair using
- Add
subnet_indexcolumn tovpn_config(integer, unique, not null) - For existing VpnConfig rows (may be multiple if multiple tenants have VPN):
- Assign sequential
subnet_indexvalues starting from 1 - Update
subnetto10.10.{index}.0/24 - Update
server_addressto10.10.{index}.1/24
- Assign sequential
- For existing VpnPeer rows:
- Remap IPs:
10.10.0.X→10.10.{tenant's index}.X(preserve the host octet) - Example: Tenant A (index 1) peer at
10.10.0.2→10.10.1.2. Tenant B (index 2) peer at10.10.0.2→10.10.2.2. No collision.
- Remap IPs:
- Regenerate
wg0.confusing the new global sync function
Device-Side Update Required
This is a breaking change for existing VPN peers. After migration:
- Devices need updated RouterOS commands:
- New server public key (global key replaces per-tenant key)
- New VPN IP address (
10.10.0.X→10.10.{index}.X) - New allowed-address (
10.10.{index}.0/24)
- The API should expose a "regenerate commands" endpoint or show a banner in the UI indicating that VPN reconfiguration is needed.
Migration Communication
After the migration runs:
- Log a warning with the list of affected devices
- Show a banner in the VPN UI: "VPN network updated — devices need reconfiguration. Click here for updated commands."
- The existing "View Setup Commands" button in the UI will show the correct updated commands.
API Changes
Modified Endpoints
| Method | Path | Change |
|---|---|---|
POST |
/api/tenants/{id}/vpn |
setup_vpn allocates subnet_index, uses global server key |
GET |
/api/tenants/{id}/vpn |
Returns tenant's specific subnet info |
GET |
/api/tenants/{id}/vpn/peers/{id}/config |
Returns commands with tenant-specific subnet and global server key |
POST |
/api/tenants/{id}/vpn/peers |
Validates additional_allowed_ips doesn't overlap 10.10.0.0/16 |
DELETE |
/api/tenants/{id} |
Calls sync_wireguard_config(db) after cascade delete |
No New Endpoints
The isolation is transparent — tenants don't need to know about it.
Error Handling
| Scenario | HTTP Status | Message |
|---|---|---|
| No available subnet index (255 tenants with VPN) | 422 | "VPN subnet pool exhausted" |
| Subnet index conflict (race condition) | — | Retry allocation once |
additional_allowed_ips overlaps VPN space |
422 | "Additional allowed IPs must not overlap the VPN address space (10.10.0.0/16)" |
Testing
- Create two tenants with VPN enabled → verify they get different subnets (
10.10.1.0/24,10.10.2.0/24) - Add peers in both → verify IPs don't collide
- From tenant A's device, attempt to ping tenant B's device → verify it's blocked
- Verify
wg0.confcontains peers from both tenants with correct subnets - Verify iptables rules are in place after container restart (idempotent)
- Verify
additional_allowed_ipswith10.10.x.xsubnet is rejected - Delete a tenant → verify
wg0.confis regenerated without its peers - Disable a tenant's VPN → verify peers excluded from
wg0.conf - Empty state (no enabled tenants) → verify
wg0.confhas only[Interface]section - Migration: multiple tenants sharing
10.10.0.0/24→ verify correct remapping to unique subnets
Audit Logging
- Subnet allocated (tenant_id, subnet_index, subnet)
- Global server keypair generated (first-run event)
- VPN config regenerated (triggered by which operation)
Out of Scope
- Multiple WireGuard interfaces (not needed at current scale)
- Manual subnet assignment
- IPv6 VPN support (IPv6 is blocked as a security measure)
- Per-tenant WireGuard listen ports
- VPN-level rate limiting or bandwidth quotas