Compare commits

..

10 Commits

Author SHA1 Message Date
Jason Staack
fd70a21328 chore: gitignore incidents directory
Some checks failed
CI / Lint Python (Ruff) (push) Has been cancelled
CI / Lint Go (golangci-lint) (push) Has been cancelled
CI / Lint Frontend (ESLint + tsc) (push) Has been cancelled
CI / Test Backend (pytest) (push) Has been cancelled
CI / Test Go Poller (push) Has been cancelled
CI / Test Frontend (Vitest) (push) Has been cancelled
CI / Build & Scan Docker Images (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 07:55:47 -05:00
Jason Staack
e1d81b40ac fix: cap NATS JetStream streams to prevent OOM crash
WIRELESS_REGISTRATIONS stream had a 256MB MaxBytes cap in a 256MB
container — guaranteed to crash under load. ALERT_EVENTS and
OPERATION_EVENTS had no byte limit at all.

- Reduce WIRELESS_REGISTRATIONS MaxBytes from 256MB to 128MB
- Add 16MB MaxBytes cap to ALERT_EVENTS and OPERATION_EVENTS
- Bump NATS container memory limit from 256MB to 384MB
- Add restart: unless-stopped to NATS in base compose
- Bump version to 9.8.2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 07:52:07 -05:00
Jason Staack
231154d28b fix(lint): format SNMP and credential profile files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 18:42:28 -05:00
Jason Staack
e22163c55f fix(ci): format setup.py, register CredentialProfile model
- Run ruff format on setup.py to fix pre-existing style violations
- Add CredentialProfile import to models/__init__.py so SQLAlchemy
  can resolve the Device.credential_profile relationship in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 18:38:09 -05:00
Jason Staack
b1ac1cce24 feat: v9.8.1 pre-built Docker images and GHCR release workflow
Setup.py now asks whether to pull pre-built images from GHCR
(recommended) or build from source. Pre-built mode skips the
15-minute compile step entirely.

- Add .github/workflows/release.yml (builds+pushes 4 images on tag)
- Add docker-compose.build.yml (source-build overlay)
- Switch docker-compose.prod.yml from build: to image: refs
- Add --build-mode CLI arg and wizard step to setup.py
- Bump version to 9.8.1 across all files
- Document TOD_VERSION env var in CONFIGURATION.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 18:33:12 -05:00
Jason Staack
0c1ffe0e39 fix(ci): resolve all lint and test failures
- Go: nil-safe profile cache in SNMPCollector, updated test assertion
- ESLint: fix conditional useQuery hook in SNMPMetricsSection
- ESLint: remove unused CREDENTIAL_TYPE_LABELS, ChevronDown/Right,
  EmptyState import, advancedOpen state
- TypeScript: replace empty interface with type alias

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 17:45:06 -05:00
Jason Staack
023e45c908 fix(website): update status version to 9.8.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 13:02:49 -05:00
Jason Staack
a9e3c79cca blog: SNMP Works (And the UI Got Out of the Way)
v9.8 release post covering SNMP support, UI simplification,
limitations, and where the project is heading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 12:59:00 -05:00
Jason Staack
4b0cc056bd fix(website): restore MikroTik prominence, add SNMP as feature line item
MikroTik is the product. SNMP is a feature. Restored firmware management
and SRP-6a to feature list. SNMP added as one line, not a replacement.
Titles stay MikroTik-first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 12:56:22 -05:00
Jason Staack
2cc139bc0e docs(website): add SNMP support to homepage for v9.8.0
Updated title, meta descriptions, OG/Twitter cards, structured data,
tagline, and feature list to reflect multi-vendor SNMP monitoring.
MikroTik remains prominent — SNMP is positioned alongside it.
Version bumped to 9.8.0 in structured data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 12:54:55 -05:00
32 changed files with 888 additions and 286 deletions

87
.github/workflows/release.yml vendored Normal file
View File

@@ -0,0 +1,87 @@
name: Release
on:
push:
tags: ["v*"]
permissions:
contents: write
packages: write
env:
REGISTRY: ghcr.io
IMAGE_PREFIX: ghcr.io/staack/the-other-dude
jobs:
build-and-push:
name: Build & Push Docker Images
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Extract version from tag
id: version
run: echo "version=${GITHUB_REF_NAME#v}" >> "$GITHUB_OUTPUT"
- name: Log in to GHCR
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
# Build and push each image sequentially to avoid OOM on the runner.
# Each multi-stage build (Go, Python/pip, Node/tsc) peaks at 1-2 GB.
- name: Build & push API
uses: docker/build-push-action@v6
with:
context: .
file: infrastructure/docker/Dockerfile.api
push: true
tags: |
${{ env.IMAGE_PREFIX }}/api:${{ steps.version.outputs.version }}
${{ env.IMAGE_PREFIX }}/api:latest
cache-from: type=gha,scope=api
cache-to: type=gha,mode=max,scope=api
- name: Build & push Poller
uses: docker/build-push-action@v6
with:
context: ./poller
file: poller/Dockerfile
push: true
tags: |
${{ env.IMAGE_PREFIX }}/poller:${{ steps.version.outputs.version }}
${{ env.IMAGE_PREFIX }}/poller:latest
cache-from: type=gha,scope=poller
cache-to: type=gha,mode=max,scope=poller
- name: Build & push Frontend
uses: docker/build-push-action@v6
with:
context: .
file: infrastructure/docker/Dockerfile.frontend
push: true
tags: |
${{ env.IMAGE_PREFIX }}/frontend:${{ steps.version.outputs.version }}
${{ env.IMAGE_PREFIX }}/frontend:latest
cache-from: type=gha,scope=frontend
cache-to: type=gha,mode=max,scope=frontend
- name: Build & push WinBox Worker
uses: docker/build-push-action@v6
with:
context: ./winbox-worker
file: winbox-worker/Dockerfile
platforms: linux/amd64
push: true
tags: |
${{ env.IMAGE_PREFIX }}/winbox-worker:${{ steps.version.outputs.version }}
${{ env.IMAGE_PREFIX }}/winbox-worker:latest
cache-from: type=gha,scope=winbox-worker
cache-to: type=gha,mode=max,scope=winbox-worker

3
.gitignore vendored
View File

@@ -42,6 +42,9 @@ Thumbs.db
# Helm local overrides (contain dev credentials) # Helm local overrides (contain dev credentials)
infrastructure/helm/values-local.yaml infrastructure/helm/values-local.yaml
# Incident reports (internal)
incidents/
# Local-only planning and design docs # Local-only planning and design docs
.planning/ .planning/
.superpowers/ .superpowers/

View File

@@ -1 +1 @@
9.8.0 9.8.2

View File

@@ -41,12 +41,8 @@ def upgrade() -> None:
""") """)
) )
conn.execute( conn.execute(sa.text("ALTER TABLE credential_profiles ENABLE ROW LEVEL SECURITY"))
sa.text("ALTER TABLE credential_profiles ENABLE ROW LEVEL SECURITY") conn.execute(sa.text("ALTER TABLE credential_profiles FORCE ROW LEVEL SECURITY"))
)
conn.execute(
sa.text("ALTER TABLE credential_profiles FORCE ROW LEVEL SECURITY")
)
conn.execute( conn.execute(
sa.text(""" sa.text("""
@@ -63,20 +59,13 @@ def upgrade() -> None:
""") """)
) )
conn.execute( conn.execute(sa.text("GRANT SELECT ON credential_profiles TO poller_user"))
sa.text("GRANT SELECT ON credential_profiles TO poller_user") conn.execute(sa.text("GRANT SELECT, INSERT, UPDATE, DELETE ON credential_profiles TO app_user"))
)
conn.execute(
sa.text("GRANT SELECT, INSERT, UPDATE, DELETE ON credential_profiles TO app_user")
)
def downgrade() -> None: def downgrade() -> None:
conn = op.get_bind() conn = op.get_bind()
conn.execute( conn.execute(
sa.text( sa.text("DROP POLICY IF EXISTS credential_profiles_tenant_isolation ON credential_profiles")
"DROP POLICY IF EXISTS credential_profiles_tenant_isolation"
" ON credential_profiles"
)
) )
op.drop_table("credential_profiles") op.drop_table("credential_profiles")

View File

@@ -630,12 +630,8 @@ def upgrade() -> None:
) )
# -- RLS: system profiles visible to all tenants ----------------------- # -- RLS: system profiles visible to all tenants -----------------------
conn.execute( conn.execute(sa.text("ALTER TABLE snmp_profiles ENABLE ROW LEVEL SECURITY"))
sa.text("ALTER TABLE snmp_profiles ENABLE ROW LEVEL SECURITY") conn.execute(sa.text("ALTER TABLE snmp_profiles FORCE ROW LEVEL SECURITY"))
)
conn.execute(
sa.text("ALTER TABLE snmp_profiles FORCE ROW LEVEL SECURITY")
)
conn.execute( conn.execute(
sa.text(""" sa.text("""
CREATE POLICY snmp_profiles_tenant_isolation CREATE POLICY snmp_profiles_tenant_isolation
@@ -648,12 +644,8 @@ def upgrade() -> None:
""") """)
) )
conn.execute( conn.execute(sa.text("GRANT SELECT ON snmp_profiles TO poller_user"))
sa.text("GRANT SELECT ON snmp_profiles TO poller_user") conn.execute(sa.text("GRANT SELECT, INSERT, UPDATE, DELETE ON snmp_profiles TO app_user"))
)
conn.execute(
sa.text("GRANT SELECT, INSERT, UPDATE, DELETE ON snmp_profiles TO app_user")
)
# -- Seed 6 system profiles -------------------------------------------- # -- Seed 6 system profiles --------------------------------------------
for profile in SEED_PROFILES: for profile in SEED_PROFILES:
@@ -679,10 +671,5 @@ def upgrade() -> None:
def downgrade() -> None: def downgrade() -> None:
conn = op.get_bind() conn = op.get_bind()
conn.execute( conn.execute(sa.text("DROP POLICY IF EXISTS snmp_profiles_tenant_isolation ON snmp_profiles"))
sa.text(
"DROP POLICY IF EXISTS snmp_profiles_tenant_isolation"
" ON snmp_profiles"
)
)
op.drop_table("snmp_profiles") op.drop_table("snmp_profiles")

View File

@@ -29,25 +29,12 @@ def upgrade() -> None:
conn.execute(sa.text("SET lock_timeout = '3s'")) conn.execute(sa.text("SET lock_timeout = '3s'"))
conn.execute( conn.execute(
sa.text( sa.text("ALTER TABLE devices ADD COLUMN device_type TEXT NOT NULL DEFAULT 'routeros'")
"ALTER TABLE devices"
" ADD COLUMN device_type TEXT NOT NULL DEFAULT 'routeros'"
)
) )
conn.execute( conn.execute(sa.text("ALTER TABLE devices ADD COLUMN snmp_port INTEGER DEFAULT 161"))
sa.text(
"ALTER TABLE devices"
" ADD COLUMN snmp_port INTEGER DEFAULT 161"
)
)
conn.execute( conn.execute(sa.text("ALTER TABLE devices ADD COLUMN snmp_version TEXT"))
sa.text(
"ALTER TABLE devices"
" ADD COLUMN snmp_version TEXT"
)
)
conn.execute( conn.execute(
sa.text( sa.text(
@@ -69,18 +56,8 @@ def upgrade() -> None:
def downgrade() -> None: def downgrade() -> None:
conn = op.get_bind() conn = op.get_bind()
conn.execute( conn.execute(sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS credential_profile_id"))
sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS credential_profile_id") conn.execute(sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS snmp_profile_id"))
) conn.execute(sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS snmp_version"))
conn.execute( conn.execute(sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS snmp_port"))
sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS snmp_profile_id") conn.execute(sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS device_type"))
)
conn.execute(
sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS snmp_version")
)
conn.execute(
sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS snmp_port")
)
conn.execute(
sa.text("ALTER TABLE devices DROP COLUMN IF EXISTS device_type")
)

View File

@@ -44,11 +44,7 @@ def upgrade() -> None:
conn.execute(sa.text("SELECT create_hypertable('snmp_metrics', 'time')")) conn.execute(sa.text("SELECT create_hypertable('snmp_metrics', 'time')"))
conn.execute( conn.execute(sa.text("SELECT add_retention_policy('snmp_metrics', INTERVAL '90 days')"))
sa.text(
"SELECT add_retention_policy('snmp_metrics', INTERVAL '90 days')"
)
)
conn.execute( conn.execute(
sa.text(""" sa.text("""
@@ -57,12 +53,8 @@ def upgrade() -> None:
""") """)
) )
conn.execute( conn.execute(sa.text("ALTER TABLE snmp_metrics ENABLE ROW LEVEL SECURITY"))
sa.text("ALTER TABLE snmp_metrics ENABLE ROW LEVEL SECURITY") conn.execute(sa.text("ALTER TABLE snmp_metrics FORCE ROW LEVEL SECURITY"))
)
conn.execute(
sa.text("ALTER TABLE snmp_metrics FORCE ROW LEVEL SECURITY")
)
conn.execute( conn.execute(
sa.text(""" sa.text("""
@@ -75,9 +67,7 @@ def upgrade() -> None:
""") """)
) )
conn.execute( conn.execute(sa.text("GRANT SELECT, INSERT ON snmp_metrics TO app_user"))
sa.text("GRANT SELECT, INSERT ON snmp_metrics TO app_user")
)
def downgrade() -> None: def downgrade() -> None:

View File

@@ -144,7 +144,7 @@ class Settings(BaseSettings):
# App settings # App settings
APP_NAME: str = "TOD - The Other Dude" APP_NAME: str = "TOD - The Other Dude"
APP_VERSION: str = "9.8.0" APP_VERSION: str = "9.8.2"
DEBUG: bool = False DEBUG: bool = False
@field_validator("CREDENTIAL_ENCRYPTION_KEY") @field_validator("CREDENTIAL_ENCRYPTION_KEY")

View File

@@ -22,6 +22,7 @@ from app.models.config_backup import RouterConfigSnapshot, RouterConfigDiff, Rou
from app.models.device_interface import DeviceInterface from app.models.device_interface import DeviceInterface
from app.models.wireless_link import WirelessLink, LinkState from app.models.wireless_link import WirelessLink, LinkState
from app.models.site_alert import SiteAlertRule, SiteAlertEvent from app.models.site_alert import SiteAlertRule, SiteAlertEvent
from app.models.credential_profile import CredentialProfile
__all__ = [ __all__ = [
"Tenant", "Tenant",
@@ -55,4 +56,5 @@ __all__ = [
"LinkState", "LinkState",
"SiteAlertRule", "SiteAlertRule",
"SiteAlertEvent", "SiteAlertEvent",
"CredentialProfile",
] ]

View File

@@ -113,7 +113,10 @@ async def update_profile(
"""Update a credential profile. Requires operator role or above.""" """Update a credential profile. Requires operator role or above."""
await _check_tenant_access(current_user, tenant_id, db) await _check_tenant_access(current_user, tenant_id, db)
return await credential_profile_service.update_profile( return await credential_profile_service.update_profile(
db=db, tenant_id=tenant_id, profile_id=profile_id, data=data, db=db,
tenant_id=tenant_id,
profile_id=profile_id,
data=data,
user_id=current_user.user_id, user_id=current_user.user_id,
) )
@@ -136,7 +139,9 @@ async def delete_profile(
""" """
await _check_tenant_access(current_user, tenant_id, db) await _check_tenant_access(current_user, tenant_id, db)
await credential_profile_service.delete_profile( await credential_profile_service.delete_profile(
db=db, tenant_id=tenant_id, profile_id=profile_id, db=db,
tenant_id=tenant_id,
profile_id=profile_id,
user_id=current_user.user_id, user_id=current_user.user_id,
) )

View File

@@ -31,7 +31,11 @@ from sqlalchemy.ext.asyncio import AsyncSession
from app.config import settings from app.config import settings
from app.database import get_db from app.database import get_db
from app.middleware.rbac import require_operator_or_above, require_scope, require_tenant_admin_or_above from app.middleware.rbac import (
require_operator_or_above,
require_scope,
require_tenant_admin_or_above,
)
from app.middleware.tenant_context import CurrentUser, get_current_user from app.middleware.tenant_context import CurrentUser, get_current_user
from app.routers.devices import _check_tenant_access from app.routers.devices import _check_tenant_access
from app.schemas.snmp_profile import ( from app.schemas.snmp_profile import (
@@ -252,7 +256,7 @@ async def update_profile(
sql = f""" sql = f"""
UPDATE snmp_profiles UPDATE snmp_profiles
SET {', '.join(set_clauses)} SET {", ".join(set_clauses)}
WHERE id = :profile_id AND tenant_id = :tenant_id WHERE id = :profile_id AND tenant_id = :tenant_id
RETURNING id, tenant_id, name, description, sys_object_id, vendor, RETURNING id, tenant_id, name, description, sys_object_id, vendor,
category, is_system, created_at, updated_at category, is_system, created_at, updated_at

View File

@@ -46,9 +46,7 @@ class CredentialProfileCreate(BaseModel):
@classmethod @classmethod
def validate_credential_type(cls, v: str) -> str: def validate_credential_type(cls, v: str) -> str:
if v not in VALID_CREDENTIAL_TYPES: if v not in VALID_CREDENTIAL_TYPES:
raise ValueError( raise ValueError(f"credential_type must be one of: {', '.join(VALID_CREDENTIAL_TYPES)}")
f"credential_type must be one of: {', '.join(VALID_CREDENTIAL_TYPES)}"
)
return v return v
@model_validator(mode="after") @model_validator(mode="after")
@@ -141,9 +139,7 @@ class CredentialProfileUpdate(BaseModel):
if v is None: if v is None:
return v return v
if v not in VALID_CREDENTIAL_TYPES: if v not in VALID_CREDENTIAL_TYPES:
raise ValueError( raise ValueError(f"credential_type must be one of: {', '.join(VALID_CREDENTIAL_TYPES)}")
f"credential_type must be one of: {', '.join(VALID_CREDENTIAL_TYPES)}"
)
return v return v
@model_validator(mode="after") @model_validator(mode="after")
@@ -151,9 +147,14 @@ class CredentialProfileUpdate(BaseModel):
"""Validate credential fields only when credential_type or credential fields change.""" """Validate credential fields only when credential_type or credential fields change."""
# Collect which credential fields were provided # Collect which credential fields were provided
cred_fields = { cred_fields = {
"username", "password", "community", "username",
"security_level", "auth_protocol", "auth_passphrase", "password",
"priv_protocol", "priv_passphrase", "community",
"security_level",
"auth_protocol",
"auth_passphrase",
"priv_protocol",
"priv_passphrase",
} }
has_cred_changes = any(getattr(self, f) is not None for f in cred_fields) has_cred_changes = any(getattr(self, f) is not None for f in cred_fields)

View File

@@ -65,7 +65,9 @@ def _build_credential_json(data: CredentialProfileCreate | CredentialProfileUpda
raise ValueError(f"Unknown credential_type: {ct}") raise ValueError(f"Unknown credential_type: {ct}")
def _profile_response(profile: CredentialProfile, device_count: int = 0) -> CredentialProfileResponse: def _profile_response(
profile: CredentialProfile, device_count: int = 0
) -> CredentialProfileResponse:
"""Build a CredentialProfileResponse from an ORM instance.""" """Build a CredentialProfileResponse from an ORM instance."""
return CredentialProfileResponse( return CredentialProfileResponse(
id=profile.id, id=profile.id,
@@ -116,9 +118,11 @@ async def get_profiles(
credential_type: str | None = None, credential_type: str | None = None,
) -> CredentialProfileListResponse: ) -> CredentialProfileListResponse:
"""List all credential profiles for a tenant.""" """List all credential profiles for a tenant."""
query = select(CredentialProfile).where( query = (
CredentialProfile.tenant_id == tenant_id select(CredentialProfile)
).order_by(CredentialProfile.name) .where(CredentialProfile.tenant_id == tenant_id)
.order_by(CredentialProfile.name)
)
if credential_type: if credential_type:
query = query.where(CredentialProfile.credential_type == credential_type) query = query.where(CredentialProfile.credential_type == credential_type)
@@ -141,10 +145,7 @@ async def get_profiles(
for row in count_result: for row in count_result:
device_counts[row.credential_profile_id] = row.cnt device_counts[row.credential_profile_id] = row.cnt
responses = [ responses = [_profile_response(p, device_count=device_counts.get(p.id, 0)) for p in profiles]
_profile_response(p, device_count=device_counts.get(p.id, 0))
for p in profiles
]
return CredentialProfileListResponse(profiles=responses) return CredentialProfileListResponse(profiles=responses)
@@ -211,9 +212,14 @@ async def update_profile(
# Determine if credential re-encryption is needed # Determine if credential re-encryption is needed
cred_fields = { cred_fields = {
"username", "password", "community", "username",
"security_level", "auth_protocol", "auth_passphrase", "password",
"priv_protocol", "priv_passphrase", "community",
"security_level",
"auth_protocol",
"auth_passphrase",
"priv_protocol",
"priv_passphrase",
} }
has_cred_changes = any(getattr(data, f) is not None for f in cred_fields) has_cred_changes = any(getattr(data, f) is not None for f in cred_fields)
type_changed = data.credential_type is not None type_changed = data.credential_type is not None
@@ -241,13 +247,18 @@ async def update_profile(
action="credential_profile.update", action="credential_profile.update",
resource_type="credential_profile", resource_type="credential_profile",
resource_id=str(profile.id), resource_id=str(profile.id),
details={"name": profile.name, "updated_fields": list(data.model_dump(exclude_unset=True).keys())}, details={
"name": profile.name,
"updated_fields": list(data.model_dump(exclude_unset=True).keys()),
},
) )
return _profile_response(profile, device_count=dc) return _profile_response(profile, device_count=dc)
def _merge_update(data: CredentialProfileUpdate, profile: CredentialProfile) -> CredentialProfileUpdate: def _merge_update(
data: CredentialProfileUpdate, profile: CredentialProfile
) -> CredentialProfileUpdate:
"""For partial credential updates, overlay data onto existing profile type. """For partial credential updates, overlay data onto existing profile type.
When credential_type is not changing but individual credential fields are, When credential_type is not changing but individual credential fields are,

View File

@@ -63,6 +63,7 @@ async def ensure_sse_streams() -> None:
name="ALERT_EVENTS", name="ALERT_EVENTS",
subjects=["alert.fired.>", "alert.resolved.>"], subjects=["alert.fired.>", "alert.resolved.>"],
max_age=3600, # 1 hour retention max_age=3600, # 1 hour retention
max_bytes=16 * 1024 * 1024, # 16MB cap
) )
) )
logger.info("nats.stream.ensured", stream="ALERT_EVENTS") logger.info("nats.stream.ensured", stream="ALERT_EVENTS")
@@ -72,6 +73,7 @@ async def ensure_sse_streams() -> None:
name="OPERATION_EVENTS", name="OPERATION_EVENTS",
subjects=["firmware.progress.>"], subjects=["firmware.progress.>"],
max_age=3600, # 1 hour retention max_age=3600, # 1 hour retention
max_bytes=16 * 1024 * 1024, # 16MB cap
) )
) )
logger.info("nats.stream.ensured", stream="OPERATION_EVENTS") logger.info("nats.stream.ensured", stream="OPERATION_EVENTS")

View File

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
[project] [project]
name = "the-other-dude-backend" name = "the-other-dude-backend"
version = "9.8.0" version = "9.8.2"
description = "MikroTik Fleet Management Portal - Backend API" description = "MikroTik Fleet Management Portal - Backend API"
requires-python = ">=3.12" requires-python = ">=3.12"
dependencies = [ dependencies = [

28
docker-compose.build.yml Normal file
View File

@@ -0,0 +1,28 @@
# docker-compose.build.yml -- Build-from-source override
#
# Adds build contexts so Docker Compose builds images locally instead of
# pulling pre-built images from GHCR.
#
# Usage:
# docker compose -f docker-compose.yml -f docker-compose.prod.yml \
# -f docker-compose.build.yml --env-file .env.prod up -d --build
services:
api:
build:
context: .
dockerfile: infrastructure/docker/Dockerfile.api
poller:
build:
context: ./poller
dockerfile: ./Dockerfile
frontend:
build:
context: .
dockerfile: infrastructure/docker/Dockerfile.frontend
winbox-worker:
build:
context: ./winbox-worker

View File

@@ -1,5 +1,10 @@
# docker-compose.prod.yml -- Production environment override # docker-compose.prod.yml -- Production environment override
# Usage: docker compose -f docker-compose.yml -f docker-compose.prod.yml --env-file .env.prod up -d #
# Pre-built images (recommended):
# docker compose -f docker-compose.yml -f docker-compose.prod.yml --env-file .env.prod up -d
#
# Build from source:
# docker compose -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.build.yml --env-file .env.prod up -d
services: services:
postgres: postgres:
@@ -13,9 +18,7 @@ services:
retries: 5 retries: 5
api: api:
build: image: ghcr.io/staack/the-other-dude/api:${TOD_VERSION:-latest}
context: .
dockerfile: infrastructure/docker/Dockerfile.api
container_name: tod_api container_name: tod_api
env_file: .env.prod env_file: .env.prod
environment: environment:
@@ -67,9 +70,7 @@ services:
- tod_remote_worker - tod_remote_worker
poller: poller:
build: image: ghcr.io/staack/the-other-dude/poller:${TOD_VERSION:-latest}
context: ./poller
dockerfile: ./Dockerfile
container_name: tod_poller container_name: tod_poller
env_file: .env.prod env_file: .env.prod
cap_add: cap_add:
@@ -135,6 +136,7 @@ services:
max-file: "3" max-file: "3"
winbox-worker: winbox-worker:
image: ghcr.io/staack/the-other-dude/winbox-worker:${TOD_VERSION:-latest}
environment: environment:
LOG_LEVEL: info LOG_LEVEL: info
MAX_CONCURRENT_SESSIONS: 10 MAX_CONCURRENT_SESSIONS: 10
@@ -146,9 +148,7 @@ services:
restart: unless-stopped restart: unless-stopped
frontend: frontend:
build: image: ghcr.io/staack/the-other-dude/frontend:${TOD_VERSION:-latest}
context: .
dockerfile: infrastructure/docker/Dockerfile.frontend
container_name: tod_frontend container_name: tod_frontend
ports: ports:
- "3000:80" - "3000:80"

View File

@@ -70,10 +70,11 @@ services:
interval: 5s interval: 5s
timeout: 5s timeout: 5s
retries: 5 retries: 5
restart: unless-stopped
deploy: deploy:
resources: resources:
limits: limits:
memory: 256M memory: 384M
networks: networks:
- tod - tod

View File

@@ -9,7 +9,8 @@ TOD uses Pydantic Settings for configuration. All values can be set via environm
| Variable | Default | Description | | Variable | Default | Description |
|----------|---------|-------------| |----------|---------|-------------|
| `APP_NAME` | `TOD - The Other Dude` | Application display name | | `APP_NAME` | `TOD - The Other Dude` | Application display name |
| `APP_VERSION` | `9.7.2` | Semantic version string (see VERSION file at project root) | | `APP_VERSION` | `9.8.2` | Semantic version string (see VERSION file at project root) |
| `TOD_VERSION` | `latest` | Docker image tag for pre-built images (set by setup.py) |
| `ENVIRONMENT` | `dev` | Runtime environment: `dev`, `staging`, or `production` | | `ENVIRONMENT` | `dev` | Runtime environment: `dev`, `staging`, or `production` |
| `DEBUG` | `false` | Enable debug mode | | `DEBUG` | `false` | Enable debug mode |
| `CORS_ORIGINS` | `http://localhost:3000,http://localhost:5173,http://localhost:8080` | Comma-separated list of allowed CORS origins | | `CORS_ORIGINS` | `http://localhost:3000,http://localhost:5173,http://localhost:8080` | Comma-separated list of allowed CORS origins |

View File

@@ -138,6 +138,13 @@
<p class="blog-subtitle">Updates, insights, and the occasional rant about MikroTik fleet management.</p> <p class="blog-subtitle">Updates, insights, and the occasional rant about MikroTik fleet management.</p>
<ul class="blog-list"> <ul class="blog-list">
<li>
<a href="snmp-works.html">
<div class="blog-list-date">March 22, 2026</div>
<div class="blog-list-title">SNMP Works (And the UI Got Out of the Way)</div>
<div class="blog-list-excerpt">v9.8 adds SNMP device monitoring. It polls real devices, collects real data, and shows it alongside your MikroTik fleet. The UI also got simpler.</div>
</a>
</li>
<li> <li>
<a href="500-devices-broke-the-api.html"> <a href="500-devices-broke-the-api.html">
<div class="blog-list-date">March 21, 2026</div> <div class="blog-list-date">March 21, 2026</div>

View File

@@ -0,0 +1,210 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>SNMP Works (And the UI Got Out of the Way) — The Other Dude Blog</title>
<meta name="description" content="v9.8 adds SNMP device monitoring. It polls real devices, collects real data, and shows it alongside your MikroTik fleet. The UI also got simpler.">
<meta name="keywords" content="MikroTik, SNMP monitoring, fleet management, network management, The Other Dude, SNMP poller">
<meta name="author" content="The Other Dude">
<meta name="robots" content="index, follow">
<meta name="theme-color" content="#eae7de">
<link rel="canonical" href="https://theotherdude.net/blog/snmp-works.html">
<link rel="icon" href="../data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 64 64'><rect x='2' y='2' width='60' height='60' rx='8' fill='none' stroke='%238B1A1A' stroke-width='2'/><path d='M32 18 L46 32 L32 46 L18 32 Z' fill='%238B1A1A'/><path d='M32 19 L38 32 L32 45 L26 32 Z' fill='%232A9D8F'/><circle cx='32' cy='32' r='5' fill='%238B1A1A'/><circle cx='32' cy='32' r='2.5' fill='%232A9D8F'/></svg>">
<!-- Open Graph -->
<meta property="og:type" content="article">
<meta property="og:title" content="SNMP Works (And the UI Got Out of the Way) — The Other Dude">
<meta property="og:description" content="v9.8 adds SNMP device monitoring. It polls real devices, collects real data, and shows it alongside your MikroTik fleet. The UI also got simpler.">
<meta property="og:url" content="https://theotherdude.net/blog/snmp-works.html">
<meta property="og:site_name" content="The Other Dude">
<meta property="article:published_time" content="2026-03-22">
<!-- Structured Data -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BlogPosting",
"headline": "SNMP Works (And the UI Got Out of the Way)",
"description": "v9.8 adds SNMP device monitoring. It polls real devices, collects real data, and shows it alongside your MikroTik fleet. The UI also got simpler.",
"datePublished": "2026-03-22",
"author": {
"@type": "Organization",
"name": "The Other Dude"
},
"publisher": {
"@type": "Organization",
"name": "The Other Dude",
"url": "https://theotherdude.net"
},
"mainEntityOfPage": "https://theotherdude.net/blog/snmp-works.html"
}
</script>
<!-- Fonts -->
<link rel="stylesheet" href="../style.css?v=3">
<style>
/* Warm Precision overrides */
:root {
--background: #eae7de;
--surface: #f6f4ec;
--elevated: #f0ede4;
--border: rgba(40,36,28,0.12);
--text-primary: #1a1810;
--text-secondary: #5e5a4e;
--text-muted: #8a8578;
--accent: #8a7a48;
}
body { background-color: #eae7de; color: #1a1810; }
.site-nav { background: #e0dcd2 !important; border-bottom: 1px solid rgba(40,36,28,0.12); }
.site-nav .nav-logo span, .site-nav .nav-link, .site-nav .nav-cta { color: #1a1810 !important; }
.site-nav .nav-link:hover { color: #8a7a48 !important; }
.dark { /* prevent dark mode */ }
</style>
<style>
.blog-post {
max-width: 720px;
margin: 0 auto;
padding: 80px 24px 120px;
}
.blog-post-meta {
color: var(--text-muted);
font-size: 14px;
margin-bottom: 8px;
}
.blog-post h1 {
font-family: "Manrope", sans-serif;
font-weight: 700;
font-size: 2.5rem;
line-height: 1.2;
color: var(--text-primary);
margin-bottom: 40px;
}
.blog-post h2 {
font-family: "Manrope", sans-serif;
font-weight: 600;
font-size: 1.4rem;
color: var(--text-primary);
margin-top: 48px;
margin-bottom: 16px;
}
.blog-post p {
color: var(--text-secondary);
font-size: 1.05rem;
line-height: 1.75;
margin-bottom: 20px;
}
.blog-post p strong {
color: var(--text-primary);
}
.blog-post a {
color: var(--accent);
text-decoration: underline;
text-underline-offset: 3px;
}
.blog-post a:hover {
color: var(--text-primary);
}
.blog-post .back-link {
display: inline-block;
margin-bottom: 32px;
font-size: 14px;
text-decoration: none;
color: var(--text-muted);
}
.blog-post .back-link:hover {
color: var(--accent);
}
@media (max-width: 480px) {
.blog-post h1 { font-size: 1.8rem; }
.blog-post { padding: 60px 20px 80px; }
}
</style>
</head>
<body>
<nav class="site-nav">
<div class="nav-inner container">
<a href="../index.html" class="nav-logo">
<svg class="nav-logo-mark" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 64 64" width="32" height="32" aria-label="The Other Dude logo">
<rect x="2" y="2" width="60" height="60" rx="8" fill="none" stroke="#8B1A1A" stroke-width="2"/>
<rect x="6" y="6" width="52" height="52" rx="5" fill="none" stroke="#F5E6C8" stroke-width="1.5"/>
<rect x="8" y="8" width="48" height="48" rx="4" fill="#8B1A1A" opacity="0.15"/>
<path d="M32 8 L56 32 L32 56 L8 32 Z" fill="none" stroke="#8B1A1A" stroke-width="2"/>
<path d="M32 13 L51 32 L32 51 L13 32 Z" fill="none" stroke="#F5E6C8" stroke-width="1.5"/>
<path d="M32 18 L46 32 L32 46 L18 32 Z" fill="#8B1A1A"/>
<path d="M32 19 L38 32 L32 45 L26 32 Z" fill="#2A9D8F"/>
<path d="M19 32 L32 26 L45 32 L32 38 Z" fill="#F5E6C8"/>
<circle cx="32" cy="32" r="5" fill="#8B1A1A"/>
<circle cx="32" cy="32" r="2.5" fill="#2A9D8F"/>
<path d="M10 10 L16 10 L10 16 Z" fill="#2A9D8F" opacity="0.7"/>
<path d="M54 10 L54 16 L48 10 Z" fill="#2A9D8F" opacity="0.7"/>
<path d="M10 54 L16 54 L10 48 Z" fill="#2A9D8F" opacity="0.7"/>
<path d="M54 54 L48 54 L54 48 Z" fill="#2A9D8F" opacity="0.7"/>
</svg>
<span>The Other Dude</span>
</a>
<div class="nav-links">
<a href="../docs.html" class="nav-link">Docs</a>
<a href="index.html" class="nav-link">Blog</a>
<a href="https://github.com/staack/the-other-dude" class="nav-link" rel="noopener">GitHub</a>
</div>
</div>
</nav>
<main>
<article class="blog-post">
<a href="index.html" class="back-link">&larr; Back to Blog</a>
<div class="blog-post-meta">March 22, 2026</div>
<h1>SNMP Works (And the UI Got Out of the Way)</h1>
<p>v9.8 ships SNMP support. You can add an SNMP device to TOD and it will poll it, collect metrics, and show it in the fleet table alongside your MikroTik gear. Interface traffic, CPU, memory, uptime — the same data you see for your Tiks, from any device that speaks SNMP.</p>
<p>It is early. It works. I am not going to oversell it.</p>
<h2>What "Working" Means Right Now</h2>
<p>You create a credential profile — basically a community string and SNMP version — and then add a device by IP. TOD probes the device, figures out what it is from the sysObjectID, assigns a collection profile, and starts polling. Standard MIB-II data flows into the same hypertables the MikroTik poller uses. Same charts, same fleet table, same everything.</p>
<p>There are seven built-in profiles: generic, switch, router, access point, UPS, MikroTik-over-SNMP, and Ubiquiti. The generic profile covers most devices out of the box. If you need something vendor-specific, you can build a custom profile with your own OIDs.</p>
<p>SNMPv1, v2c, and v3 are all supported. Counter64 is preferred over Counter32 automatically. BulkWalk is wrapped in timeouts so a misbehaving device does not hang your poller. The basics are covered.</p>
<h2>Why This Matters</h2>
<p>Nobody runs a pure MikroTik network. There is always a managed switch, a UPS, an access point from another vendor, a piece of infrastructure that only speaks SNMP. Before v9.8, those devices were invisible to TOD. You had to run a separate monitoring system to see them, or just not monitor them at all.</p>
<p>Now they show up in the same fleet table. Same status indicators, same interface graphs. You do not have to context-switch between two different tools to understand what your network is doing.</p>
<p>That is the entire point. Not fancy SNMP features. Just visibility.</p>
<h2>The UI Got Simpler</h2>
<p>While working on SNMP, I also stripped a lot of noise out of the interface. Gradients are gone. Shimmer loading placeholders are gone. Visual effects that existed because they looked nice rather than because they communicated something useful — gone.</p>
<p>The reason is practical. When you are looking at real data from real devices — when a number on the screen represents actual traffic on an actual interface — decorative UI gets in the way. You want to read data, not admire the container it is sitting in.</p>
<p>The interface should disappear. You should be thinking about your network, not about the tool you are using to look at it. Every visual element that does not carry information is a distraction. So I removed the ones that were not carrying their weight.</p>
<h2>What Is Not Done</h2>
<p>SNMP trap reception is not implemented. The poller collects data on a schedule — it does not listen for unsolicited events from devices. That is planned but not built yet.</p>
<p>SNMP SET operations are not supported. This is read-only monitoring. You cannot push configuration to SNMP devices through TOD. That may never be a feature — SNMP config management is a different problem with a lot of vendor-specific complexity.</p>
<p>The custom profile builder works but needs more polish. Uploading vendor MIBs and browsing OID trees is functional, but the UX is not where I want it yet. The standard path — add a device, let auto-detection handle it — is solid. The power-user path needs more work.</p>
<p>Bulk add supports SNMP devices but subnet discovery is still basic. It works, but it is not as smooth as it should be.</p>
<h2>Where This Is Going</h2>
<p>This is the version where TOD stopped being a MikroTik-only tool and started being a network management system. The MikroTik support is still the core — that is what the deep integration is for, that is where WinBox and config push and firmware management live. SNMP is the layer that lets everything else in the building show up on the same screen.</p>
<p>It is not a finished product. It is a system that is getting more useful every week. If you manage MikroTik gear alongside other network hardware and you are tired of running two monitoring systems, this is worth looking at.</p>
</article>
</main>
</body>
</html>

View File

@@ -4,8 +4,8 @@
<meta charset="UTF-8"> <meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>The Other Dude — MikroTik Fleet Management</title> <title>The Other Dude — MikroTik Fleet Management</title>
<meta name="description" content="MikroTik fleet management. Self-hosted. Source-available. Monitor devices, push configuration, track changes in git."> <meta name="description" content="Network fleet management for MikroTik and SNMP devices. Self-hosted. Source-available. Monitor MikroTik routers alongside switches, APs, and UPSes from a single pane of glass.">
<meta name="keywords" content="MikroTik, RouterOS, fleet management, network management, WinBox browser, MikroTik monitoring, MikroTik configuration, router management, self-hosted, open source, source-available"> <meta name="keywords" content="MikroTik, RouterOS, SNMP monitoring, fleet management, network management, WinBox browser, MikroTik monitoring, MikroTik configuration, SNMP poller, multi-vendor NMS, router management, self-hosted, open source, source-available">
<meta name="author" content="The Other Dude"> <meta name="author" content="The Other Dude">
<meta name="robots" content="index, follow"> <meta name="robots" content="index, follow">
<meta name="google-site-verification" content="d2QVuWrLJlzOQPnA-SAJuvajEHGYbusvJ4eDdZbWSBU"> <meta name="google-site-verification" content="d2QVuWrLJlzOQPnA-SAJuvajEHGYbusvJ4eDdZbWSBU">
@@ -16,7 +16,7 @@
<!-- Open Graph --> <!-- Open Graph -->
<meta property="og:type" content="website"> <meta property="og:type" content="website">
<meta property="og:title" content="The Other Dude — MikroTik Fleet Management"> <meta property="og:title" content="The Other Dude — MikroTik Fleet Management">
<meta property="og:description" content="MikroTik fleet management. Self-hosted. Source-available."> <meta property="og:description" content="Network fleet management for MikroTik and SNMP devices. Self-hosted. Source-available.">
<meta property="og:url" content="https://theotherdude.net/"> <meta property="og:url" content="https://theotherdude.net/">
<meta property="og:image" content="https://theotherdude.net/assets/og-image.png"> <meta property="og:image" content="https://theotherdude.net/assets/og-image.png">
<meta property="og:site_name" content="The Other Dude"> <meta property="og:site_name" content="The Other Dude">
@@ -25,7 +25,7 @@
<!-- Twitter Card --> <!-- Twitter Card -->
<meta name="twitter:card" content="summary_large_image"> <meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="The Other Dude — MikroTik Fleet Management"> <meta name="twitter:title" content="The Other Dude — MikroTik Fleet Management">
<meta name="twitter:description" content="MikroTik fleet management. Self-hosted. Source-available."> <meta name="twitter:description" content="Network fleet management for MikroTik and SNMP devices. Self-hosted. Source-available.">
<meta name="twitter:image" content="https://theotherdude.net/assets/og-image.png"> <meta name="twitter:image" content="https://theotherdude.net/assets/og-image.png">
<!-- Structured Data --> <!-- Structured Data -->
@@ -36,7 +36,7 @@
"name": "The Other Dude", "name": "The Other Dude",
"applicationCategory": "NetworkApplication", "applicationCategory": "NetworkApplication",
"operatingSystem": "Linux, Docker", "operatingSystem": "Linux, Docker",
"description": "MikroTik RouterOS fleet management. Self-hosted. Source-available under BSL 1.1.", "description": "Network fleet management for MikroTik and SNMP devices. Self-hosted. Source-available under BSL 1.1.",
"url": "https://theotherdude.net", "url": "https://theotherdude.net",
"offers": { "offers": {
"@type": "Offer", "@type": "Offer",
@@ -49,12 +49,13 @@
"Track config changes in git", "Track config changes in git",
"Manage firmware versions", "Manage firmware versions",
"WinBox in the browser", "WinBox in the browser",
"SNMP device monitoring alongside MikroTik",
"VPN overlay for NAT traversal", "VPN overlay for NAT traversal",
"Multi-tenant with row-level security", "Multi-tenant with row-level security",
"Zero-knowledge authentication (SRP-6a)" "Zero-knowledge authentication (SRP-6a)"
], ],
"softwareRequirements": "Docker, PostgreSQL 17, Redis, NATS", "softwareRequirements": "Docker, PostgreSQL 17, Redis, NATS",
"softwareVersion": "9.7.2", "softwareVersion": "9.8.2",
"license": "https://mariadb.com/bsl11/" "license": "https://mariadb.com/bsl11/"
} }
</script> </script>
@@ -461,7 +462,7 @@
<div class="container"> <div class="container">
<header class="wp-header"> <header class="wp-header">
<h1>The Other Dude</h1> <h1>The Other Dude</h1>
<p class="wp-tagline">MikroTik fleet management. Self-hosted. Source-available.</p> <p class="wp-tagline">MikroTik fleet management. Now with SNMP support. Self-hosted. Source-available.</p>
<div class="wp-header-links"> <div class="wp-header-links">
<a href="https://github.com/staack/the-other-dude" rel="noopener">GitHub</a> <a href="https://github.com/staack/the-other-dude" rel="noopener">GitHub</a>
<a href="docs.html">Documentation</a> <a href="docs.html">Documentation</a>
@@ -478,6 +479,7 @@
<li>Track config changes in git</li> <li>Track config changes in git</li>
<li>Manage firmware versions</li> <li>Manage firmware versions</li>
<li>WinBox in the browser</li> <li>WinBox in the browser</li>
<li>SNMP device monitoring — switches, APs, UPSes alongside your Tiks</li>
<li>VPN overlay for NAT traversal</li> <li>VPN overlay for NAT traversal</li>
<li>Multi-tenant with row-level security</li> <li>Multi-tenant with row-level security</li>
<li>Zero-knowledge authentication (SRP-6a)</li> <li>Zero-knowledge authentication (SRP-6a)</li>
@@ -545,7 +547,7 @@
<section class="wp-section"> <section class="wp-section">
<h2>Status</h2> <h2>Status</h2>
<table class="wp-status-table"> <table class="wp-status-table">
<tr><td>Version</td><td>9.7.2</td></tr> <tr><td>Version</td><td>9.8.2</td></tr>
<tr><td>License</td><td>BSL 1.1 (converts to Apache 2.0 in 2030)</td></tr> <tr><td>License</td><td>BSL 1.1 (converts to Apache 2.0 in 2030)</td></tr>
<tr><td>Free tier</td><td>250 devices</td></tr> <tr><td>Free tier</td><td>250 devices</td></tr>
<tr><td>Stability</td><td>Breaking changes expected before v11</td></tr> <tr><td>Stability</td><td>Breaking changes expected before v11</td></tr>

View File

@@ -1,7 +1,7 @@
{ {
"name": "frontend", "name": "frontend",
"private": true, "private": true,
"version": "9.8.0", "version": "9.8.2",
"type": "module", "type": "module",
"scripts": { "scripts": {
"dev": "vite", "dev": "vite",

View File

@@ -15,14 +15,14 @@ interface SNMPMetricsSectionProps {
* and are shown by InterfaceGauges. Custom OID charting is Phase 20 (PROF-03). * and are shown by InterfaceGauges. Custom OID charting is Phase 20 (PROF-03).
*/ */
export function SNMPMetricsSection({ tenantId, snmpProfileId }: SNMPMetricsSectionProps) { export function SNMPMetricsSection({ tenantId, snmpProfileId }: SNMPMetricsSectionProps) {
if (!snmpProfileId) return null
const { data: profile } = useQuery({ const { data: profile } = useQuery({
queryKey: ['snmp-profile', tenantId, snmpProfileId], queryKey: ['snmp-profile', tenantId, snmpProfileId],
queryFn: () => snmpProfilesApi.get(tenantId, snmpProfileId!), queryFn: () => snmpProfilesApi.get(tenantId, snmpProfileId!),
enabled: !!snmpProfileId, enabled: !!snmpProfileId && !!tenantId,
}) })
if (!snmpProfileId) return null
return ( return (
<div className="rounded-sm border border-border-default bg-panel px-3 py-2"> <div className="rounded-sm border border-border-default bg-panel px-3 py-2">
<div className="flex items-center gap-2 mb-1"> <div className="flex items-center gap-2 mb-1">

View File

@@ -35,11 +35,6 @@ interface CredentialProfilesPageProps {
type CredentialType = 'routeros' | 'snmp_v2c' | 'snmp_v3' type CredentialType = 'routeros' | 'snmp_v2c' | 'snmp_v3'
type SecurityLevel = 'no_auth_no_priv' | 'auth_no_priv' | 'auth_priv' type SecurityLevel = 'no_auth_no_priv' | 'auth_no_priv' | 'auth_priv'
const CREDENTIAL_TYPE_LABELS: Record<CredentialType, string> = {
routeros: 'RouterOS',
snmp_v2c: 'SNMP v2c',
snmp_v3: 'SNMP v3',
}
const SECURITY_LEVELS: { value: SecurityLevel; label: string }[] = [ const SECURITY_LEVELS: { value: SecurityLevel; label: string }[] = [
{ value: 'no_auth_no_priv', label: 'No Auth, No Privacy' }, { value: 'no_auth_no_priv', label: 'No Auth, No Privacy' },

View File

@@ -10,8 +10,6 @@ import {
X, X,
Network, Network,
Copy, Copy,
ChevronDown,
ChevronRight,
} from 'lucide-react' } from 'lucide-react'
import { import {
snmpProfilesApi, snmpProfilesApi,
@@ -33,7 +31,6 @@ import {
SelectValue, SelectValue,
} from '@/components/ui/select' } from '@/components/ui/select'
import { Tabs, TabsList, TabsTrigger, TabsContent } from '@/components/ui/tabs' import { Tabs, TabsList, TabsTrigger, TabsContent } from '@/components/ui/tabs'
import { EmptyState } from '@/components/ui/empty-state'
import { OIDTreeBrowser } from '@/components/settings/OIDTreeBrowser' import { OIDTreeBrowser } from '@/components/settings/OIDTreeBrowser'
import { ProfileTestPanel } from '@/components/settings/ProfileTestPanel' import { ProfileTestPanel } from '@/components/settings/ProfileTestPanel'
@@ -208,7 +205,6 @@ export function SNMPProfileEditorPage({ tenantId }: SNMPProfileEditorPageProps)
const [pollGroups, setPollGroups] = useState<Record<PollGroupKey, PollGroup>>(buildEmptyPollGroups) const [pollGroups, setPollGroups] = useState<Record<PollGroupKey, PollGroup>>(buildEmptyPollGroups)
const [activePollGroup, setActivePollGroup] = useState<PollGroupKey>('standard') const [activePollGroup, setActivePollGroup] = useState<PollGroupKey>('standard')
const [selectedOids, setSelectedOids] = useState<Set<string>>(new Set()) const [selectedOids, setSelectedOids] = useState<Set<string>>(new Set())
const [advancedOpen, setAdvancedOpen] = useState(false)
// ─── MIB state ───────────────────────────────────────────────────────── // ─── MIB state ─────────────────────────────────────────────────────────

View File

@@ -495,7 +495,7 @@ export interface CredentialProfileCreate {
priv_passphrase?: string priv_passphrase?: string
} }
export interface CredentialProfileUpdate extends Partial<CredentialProfileCreate> {} export type CredentialProfileUpdate = Partial<CredentialProfileCreate>
export const credentialProfilesApi = { export const credentialProfilesApi = {
list: (tenantId: string, credentialType?: string) => list: (tenantId: string, credentialType?: string) =>

View File

@@ -3,7 +3,7 @@ name: tod
description: The Other Dude — MikroTik fleet management platform description: The Other Dude — MikroTik fleet management platform
type: application type: application
version: 1.0.0 version: 1.0.0
appVersion: "9.8.0" appVersion: "9.8.2"
kubeVersion: ">=1.28.0-0" kubeVersion: ">=1.28.0-0"
keywords: keywords:
- mikrotik - mikrotik

View File

@@ -163,7 +163,7 @@ func NewPublisher(natsURL string) (*Publisher, error) {
Name: "WIRELESS_REGISTRATIONS", Name: "WIRELESS_REGISTRATIONS",
Subjects: []string{"wireless.registrations.>"}, Subjects: []string{"wireless.registrations.>"},
MaxAge: 30 * 24 * time.Hour, // 30-day retention MaxAge: 30 * 24 * time.Hour, // 30-day retention
MaxBytes: 256 * 1024 * 1024, // 256MB cap MaxBytes: 128 * 1024 * 1024, // 128MB cap
Discard: jetstream.DiscardOld, Discard: jetstream.DiscardOld,
}) })
if err != nil { if err != nil {

View File

@@ -68,12 +68,14 @@ func (c *SNMPCollector) Collect(ctx context.Context, dev store.Device, pub *bus.
profileID := "" profileID := ""
if dev.SNMPProfileID != nil { if dev.SNMPProfileID != nil {
profileID = *dev.SNMPProfileID profileID = *dev.SNMPProfileID
} else { } else if c.profiles != nil {
profileID = c.profiles.GetGenericID() profileID = c.profiles.GetGenericID()
if profileID == "" { if profileID == "" {
return fmt.Errorf("device %s: no SNMP profile assigned and no generic-snmp fallback found", dev.ID) return fmt.Errorf("device %s: no SNMP profile assigned and no generic-snmp fallback found", dev.ID)
} }
slog.Debug("using generic-snmp fallback profile", "device_id", dev.ID) slog.Debug("using generic-snmp fallback profile", "device_id", dev.ID)
} else {
return fmt.Errorf("device %s: no SNMP profile assigned and profile cache not available", dev.ID)
} }
profile := c.profiles.Get(profileID) profile := c.profiles.Get(profileID)
if profile == nil { if profile == nil {

View File

@@ -20,7 +20,7 @@ func TestSNMPCollectorImplementsCollector(t *testing.T) {
} }
// TestSNMPCollectorCollect_NilProfileID verifies that Collect returns an error // TestSNMPCollectorCollect_NilProfileID verifies that Collect returns an error
// when the device has no SNMPProfileID set. // when the device has no SNMPProfileID and the profile cache is nil.
func TestSNMPCollectorCollect_NilProfileID(t *testing.T) { func TestSNMPCollectorCollect_NilProfileID(t *testing.T) {
collector := NewSNMPCollector(nil, nil, nil, DefaultSNMPConfig()) collector := NewSNMPCollector(nil, nil, nil, DefaultSNMPConfig())
dev := store.Device{ dev := store.Device{
@@ -32,7 +32,7 @@ func TestSNMPCollectorCollect_NilProfileID(t *testing.T) {
err := collector.Collect(context.Background(), dev, &bus.Publisher{}) err := collector.Collect(context.Background(), dev, &bus.Publisher{})
require.Error(t, err) require.Error(t, err)
assert.Contains(t, err.Error(), "no SNMP profile") assert.Contains(t, err.Error(), "no SNMP profile assigned")
} }
// TestSNMPCollectorCollect_UnknownProfileID verifies that Collect returns an error // TestSNMPCollectorCollect_UnknownProfileID verifies that Collect returns an error

580
setup.py
View File

@@ -40,10 +40,14 @@ INIT_SQL_TEMPLATE = PROJECT_ROOT / "scripts" / "init-postgres.sql"
INIT_SQL_PROD = PROJECT_ROOT / "scripts" / "init-postgres-prod.sql" INIT_SQL_PROD = PROJECT_ROOT / "scripts" / "init-postgres-prod.sql"
COMPOSE_BASE = "docker-compose.yml" COMPOSE_BASE = "docker-compose.yml"
COMPOSE_PROD = "docker-compose.prod.yml" COMPOSE_PROD = "docker-compose.prod.yml"
COMPOSE_BUILD_OVERRIDE = "docker-compose.build.yml"
COMPOSE_CMD = [ COMPOSE_CMD = [
"docker", "compose", "docker",
"-f", COMPOSE_BASE, "compose",
"-f", COMPOSE_PROD, "-f",
COMPOSE_BASE,
"-f",
COMPOSE_PROD,
] ]
REQUIRED_PORTS = { REQUIRED_PORTS = {
@@ -58,20 +62,40 @@ REQUIRED_PORTS = {
# ── Color helpers ──────────────────────────────────────────────────────────── # ── Color helpers ────────────────────────────────────────────────────────────
def _supports_color() -> bool: def _supports_color() -> bool:
return hasattr(sys.stdout, "isatty") and sys.stdout.isatty() return hasattr(sys.stdout, "isatty") and sys.stdout.isatty()
_COLOR = _supports_color() _COLOR = _supports_color()
def _c(code: str, text: str) -> str: def _c(code: str, text: str) -> str:
return f"\033[{code}m{text}\033[0m" if _COLOR else text return f"\033[{code}m{text}\033[0m" if _COLOR else text
def green(t: str) -> str: return _c("32", t)
def yellow(t: str) -> str: return _c("33", t) def green(t: str) -> str:
def red(t: str) -> str: return _c("31", t) return _c("32", t)
def cyan(t: str) -> str: return _c("36", t)
def bold(t: str) -> str: return _c("1", t)
def dim(t: str) -> str: return _c("2", t) def yellow(t: str) -> str:
return _c("33", t)
def red(t: str) -> str:
return _c("31", t)
def cyan(t: str) -> str:
return _c("36", t)
def bold(t: str) -> str:
return _c("1", t)
def dim(t: str) -> str:
return _c("2", t)
def banner(text: str) -> None: def banner(text: str) -> None:
@@ -123,7 +147,9 @@ def _collect_environment() -> dict:
try: try:
r = subprocess.run( r = subprocess.run(
["docker", "version", "--format", "{{.Server.Version}}"], ["docker", "version", "--format", "{{.Server.Version}}"],
capture_output=True, text=True, timeout=5, capture_output=True,
text=True,
timeout=5,
) )
if r.returncode == 0: if r.returncode == 0:
env["docker"] = r.stdout.strip() env["docker"] = r.stdout.strip()
@@ -133,7 +159,9 @@ def _collect_environment() -> dict:
try: try:
r = subprocess.run( r = subprocess.run(
["docker", "compose", "version", "--short"], ["docker", "compose", "version", "--short"],
capture_output=True, text=True, timeout=5, capture_output=True,
text=True,
timeout=5,
) )
if r.returncode == 0: if r.returncode == 0:
env["compose"] = r.stdout.strip() env["compose"] = r.stdout.strip()
@@ -144,15 +172,17 @@ def _collect_environment() -> dict:
if sys.platform == "darwin": if sys.platform == "darwin":
r = subprocess.run( r = subprocess.run(
["sysctl", "-n", "hw.memsize"], ["sysctl", "-n", "hw.memsize"],
capture_output=True, text=True, timeout=5, capture_output=True,
text=True,
timeout=5,
) )
if r.returncode == 0: if r.returncode == 0:
env["ram_gb"] = round(int(r.stdout.strip()) / (1024 ** 3)) env["ram_gb"] = round(int(r.stdout.strip()) / (1024**3))
else: else:
with open("/proc/meminfo") as f: with open("/proc/meminfo") as f:
for line in f: for line in f:
if line.startswith("MemTotal:"): if line.startswith("MemTotal:"):
env["ram_gb"] = round(int(line.split()[1]) * 1024 / (1024 ** 3)) env["ram_gb"] = round(int(line.split()[1]) * 1024 / (1024**3))
break break
except Exception: except Exception:
pass pass
@@ -166,7 +196,10 @@ def _get_app_version() -> tuple[str, str]:
try: try:
r = subprocess.run( r = subprocess.run(
["git", "describe", "--tags", "--always"], ["git", "describe", "--tags", "--always"],
capture_output=True, text=True, timeout=5, cwd=PROJECT_ROOT, capture_output=True,
text=True,
timeout=5,
cwd=PROJECT_ROOT,
) )
if r.returncode == 0: if r.returncode == 0:
version = r.stdout.strip() version = r.stdout.strip()
@@ -175,7 +208,10 @@ def _get_app_version() -> tuple[str, str]:
try: try:
r = subprocess.run( r = subprocess.run(
["git", "rev-parse", "--short", "HEAD"], ["git", "rev-parse", "--short", "HEAD"],
capture_output=True, text=True, timeout=5, cwd=PROJECT_ROOT, capture_output=True,
text=True,
timeout=5,
cwd=PROJECT_ROOT,
) )
if r.returncode == 0: if r.returncode == 0:
build_id = r.stdout.strip() build_id = r.stdout.strip()
@@ -202,9 +238,15 @@ class SetupTelemetry:
self._environment = _collect_environment() self._environment = _collect_environment()
self._app_version, self._build_id = _get_app_version() self._app_version, self._build_id = _get_app_version()
def step(self, step_name: str, result: str, duration_ms: int | None = None, def step(
error_message: str | None = None, error_code: str | None = None, self,
metrics: dict | None = None) -> None: step_name: str,
result: str,
duration_ms: int | None = None,
error_message: str | None = None,
error_code: str | None = None,
metrics: dict | None = None,
) -> None:
"""Emit a single setup step event. No-op if disabled.""" """Emit a single setup step event. No-op if disabled."""
if not self.enabled: if not self.enabled:
return return
@@ -249,8 +291,14 @@ class SetupTelemetry:
# ── Input helpers ──────────────────────────────────────────────────────────── # ── Input helpers ────────────────────────────────────────────────────────────
def ask(prompt: str, default: str = "", required: bool = False,
secret: bool = False, validate=None) -> str: def ask(
prompt: str,
default: str = "",
required: bool = False,
secret: bool = False,
validate=None,
) -> str:
"""Prompt the user for input with optional default, validation, and secret mode.""" """Prompt the user for input with optional default, validation, and secret mode."""
suffix = f" [{default}]" if default else "" suffix = f" [{default}]" if default else ""
full_prompt = f" {prompt}{suffix}: " full_prompt = f" {prompt}{suffix}: "
@@ -265,7 +313,9 @@ def ask(prompt: str, default: str = "", required: bool = False,
if default: if default:
return default return default
if required: if required:
raise SystemExit(f"EOF reached and no default for required field: {prompt}") raise SystemExit(
f"EOF reached and no default for required field: {prompt}"
)
return "" return ""
value = value.strip() value = value.strip()
@@ -311,6 +361,7 @@ def mask_secret(value: str) -> str:
# ── Validators ─────────────────────────────────────────────────────────────── # ── Validators ───────────────────────────────────────────────────────────────
def validate_password_strength(value: str) -> str | None: def validate_password_strength(value: str) -> str | None:
if len(value) < 12: if len(value) < 12:
return "Password must be at least 12 characters." return "Password must be at least 12 characters."
@@ -333,6 +384,7 @@ def validate_domain(value: str) -> str | None:
# ── System checks ──────────────────────────────────────────────────────────── # ── System checks ────────────────────────────────────────────────────────────
def check_python_version() -> bool: def check_python_version() -> bool:
if sys.version_info < (3, 10): if sys.version_info < (3, 10):
fail(f"Python 3.10+ required, found {sys.version}") fail(f"Python 3.10+ required, found {sys.version}")
@@ -345,7 +397,9 @@ def check_docker() -> bool:
try: try:
result = subprocess.run( result = subprocess.run(
["docker", "info"], ["docker", "info"],
capture_output=True, text=True, timeout=10, capture_output=True,
text=True,
timeout=10,
) )
if result.returncode != 0: if result.returncode != 0:
fail("Docker is not running. Start Docker and try again.") fail("Docker is not running. Start Docker and try again.")
@@ -361,7 +415,9 @@ def check_docker() -> bool:
try: try:
result = subprocess.run( result = subprocess.run(
["docker", "compose", "version"], ["docker", "compose", "version"],
capture_output=True, text=True, timeout=10, capture_output=True,
text=True,
timeout=10,
) )
if result.returncode != 0: if result.returncode != 0:
fail("Docker Compose v2 is not available.") fail("Docker Compose v2 is not available.")
@@ -381,7 +437,9 @@ def check_ram() -> None:
if sys.platform == "darwin": if sys.platform == "darwin":
result = subprocess.run( result = subprocess.run(
["sysctl", "-n", "hw.memsize"], ["sysctl", "-n", "hw.memsize"],
capture_output=True, text=True, timeout=5, capture_output=True,
text=True,
timeout=5,
) )
if result.returncode != 0: if result.returncode != 0:
return return
@@ -395,7 +453,7 @@ def check_ram() -> None:
else: else:
return return
ram_gb = ram_bytes / (1024 ** 3) ram_gb = ram_bytes / (1024**3)
if ram_gb < 4: if ram_gb < 4:
warn(f"Only {ram_gb:.1f} GB RAM detected. 4 GB+ recommended for builds.") warn(f"Only {ram_gb:.1f} GB RAM detected. 4 GB+ recommended for builds.")
else: else:
@@ -459,8 +517,8 @@ def preflight(args: argparse.Namespace) -> bool:
"""Run all pre-flight checks. Returns True if OK to proceed.""" """Run all pre-flight checks. Returns True if OK to proceed."""
banner("TOD Production Setup") banner("TOD Production Setup")
print(" This wizard will configure your production environment,") print(" This wizard will configure your production environment,")
print(" generate secrets, bootstrap OpenBao, build images, and") print(" generate secrets, bootstrap OpenBao, pull or build images,")
print(" start the stack.") print(" and start the stack.")
print() print()
section("Pre-flight Checks") section("Pre-flight Checks")
@@ -483,6 +541,7 @@ def preflight(args: argparse.Namespace) -> bool:
# ── Secret generation ──────────────────────────────────────────────────────── # ── Secret generation ────────────────────────────────────────────────────────
def generate_jwt_secret() -> str: def generate_jwt_secret() -> str:
return secrets.token_urlsafe(64) return secrets.token_urlsafe(64)
@@ -501,6 +560,7 @@ def generate_admin_password() -> str:
# ── Wizard sections ───────────────────────────────────────────────────────── # ── Wizard sections ─────────────────────────────────────────────────────────
def wizard_database(config: dict, args: argparse.Namespace) -> None: def wizard_database(config: dict, args: argparse.Namespace) -> None:
section("Database") section("Database")
info("PostgreSQL superuser password — used for migrations and admin operations.") info("PostgreSQL superuser password — used for migrations and admin operations.")
@@ -578,8 +638,12 @@ def wizard_admin(config: dict, args: argparse.Namespace) -> None:
error = validate_password_strength(password) error = validate_password_strength(password)
while error: while error:
warn(error) warn(error)
password = ask("Admin password", secret=True, required=True, password = ask(
validate=validate_password_strength) "Admin password",
secret=True,
required=True,
validate=validate_password_strength,
)
error = None # ask() already validated error = None # ask() already validated
config["admin_password"] = password config["admin_password"] = password
config["admin_password_generated"] = False config["admin_password_generated"] = False
@@ -625,7 +689,9 @@ def wizard_email(config: dict, args: argparse.Namespace) -> None:
config["smtp_host"] = ask("SMTP host", required=True) config["smtp_host"] = ask("SMTP host", required=True)
config["smtp_port"] = ask("SMTP port", default="587") config["smtp_port"] = ask("SMTP port", default="587")
config["smtp_user"] = ask("SMTP username (optional)") config["smtp_user"] = ask("SMTP username (optional)")
config["smtp_password"] = ask("SMTP password (optional)", secret=True) if config["smtp_user"] else "" config["smtp_password"] = (
ask("SMTP password (optional)", secret=True) if config["smtp_user"] else ""
)
config["smtp_from"] = ask("From address", required=True, validate=validate_email) config["smtp_from"] = ask("From address", required=True, validate=validate_email)
config["smtp_tls"] = ask_yes_no("Use TLS?", default=True) config["smtp_tls"] = ask_yes_no("Use TLS?", default=True)
@@ -641,16 +707,22 @@ def wizard_domain(config: dict, args: argparse.Namespace) -> None:
raise SystemExit(1) raise SystemExit(1)
raw = args.domain raw = args.domain
else: else:
raw = ask("Production domain (e.g. tod.example.com)", required=True, validate=validate_domain) raw = ask(
"Production domain (e.g. tod.example.com)",
required=True,
validate=validate_domain,
)
domain = re.sub(r"^https?://", "", raw).rstrip("/") domain = re.sub(r"^https?://", "", raw).rstrip("/")
config["domain"] = domain config["domain"] = domain
# Determine protocol — default HTTPS for production, allow HTTP for LAN/dev # Determine protocol — default HTTPS for production, allow HTTP for LAN/dev
if args.non_interactive: if args.non_interactive:
use_https = not getattr(args, 'no_https', False) use_https = not getattr(args, "no_https", False)
else: else:
use_https = ask_yes_no("Use HTTPS? (disable for LAN/dev without TLS)", default=True) use_https = ask_yes_no(
"Use HTTPS? (disable for LAN/dev without TLS)", default=True
)
protocol = "https" if use_https else "http" protocol = "https" if use_https else "http"
config["app_base_url"] = f"{protocol}://{domain}" config["app_base_url"] = f"{protocol}://{domain}"
@@ -659,7 +731,9 @@ def wizard_domain(config: dict, args: argparse.Namespace) -> None:
ok(f"APP_BASE_URL={protocol}://{domain}") ok(f"APP_BASE_URL={protocol}://{domain}")
ok(f"CORS_ORIGINS={protocol}://{domain}") ok(f"CORS_ORIGINS={protocol}://{domain}")
if not use_https: if not use_https:
warn("Running without HTTPS — cookies will not be Secure. Fine for LAN, not for public internet.") warn(
"Running without HTTPS — cookies will not be Secure. Fine for LAN, not for public internet."
)
# ── Reverse proxy ─────────────────────────────────────────────────────────── # ── Reverse proxy ───────────────────────────────────────────────────────────
@@ -678,7 +752,7 @@ PROXY_CONFIGS = {
"filename": None, # derived from domain "filename": None, # derived from domain
"placeholders": { "placeholders": {
"tod.example.com": None, # replaced with domain "tod.example.com": None, # replaced with domain
"YOUR_TOD_HOST": None, # replaced with host IP "YOUR_TOD_HOST": None, # replaced with host IP
}, },
}, },
"nginx": { "nginx": {
@@ -783,13 +857,16 @@ def _write_system_file(path: pathlib.Path, content: str) -> bool:
# Ensure parent directory exists # Ensure parent directory exists
subprocess.run( subprocess.run(
["sudo", "mkdir", "-p", str(path.parent)], ["sudo", "mkdir", "-p", str(path.parent)],
check=True, timeout=30, check=True,
timeout=30,
) )
# Write via sudo tee # Write via sudo tee
result = subprocess.run( result = subprocess.run(
["sudo", "tee", str(path)], ["sudo", "tee", str(path)],
input=content, text=True, input=content,
capture_output=True, timeout=30, text=True,
capture_output=True,
timeout=30,
) )
if result.returncode != 0: if result.returncode != 0:
fail(f"sudo tee failed: {result.stderr.strip()}") fail(f"sudo tee failed: {result.stderr.strip()}")
@@ -813,7 +890,9 @@ def wizard_reverse_proxy(config: dict, args: argparse.Namespace) -> None:
proxy_val = args.proxy or "skip" proxy_val = args.proxy or "skip"
if proxy_val == "skip": if proxy_val == "skip":
config["proxy_configured"] = False config["proxy_configured"] = False
info("Skipped. Example configs are in infrastructure/reverse-proxy-examples/") info(
"Skipped. Example configs are in infrastructure/reverse-proxy-examples/"
)
return return
valid_proxies = list(PROXY_CONFIGS.keys()) valid_proxies = list(PROXY_CONFIGS.keys())
if proxy_val not in valid_proxies: if proxy_val not in valid_proxies:
@@ -823,7 +902,9 @@ def wizard_reverse_proxy(config: dict, args: argparse.Namespace) -> None:
else: else:
if not ask_yes_no("Configure a reverse proxy now?", default=True): if not ask_yes_no("Configure a reverse proxy now?", default=True):
config["proxy_configured"] = False config["proxy_configured"] = False
info("Skipped. Example configs are in infrastructure/reverse-proxy-examples/") info(
"Skipped. Example configs are in infrastructure/reverse-proxy-examples/"
)
return return
# Detect installed proxies # Detect installed proxies
@@ -858,7 +939,9 @@ def wizard_reverse_proxy(config: dict, args: argparse.Namespace) -> None:
idx = int(choice) - 1 idx = int(choice) - 1
if idx == len(choices): if idx == len(choices):
config["proxy_configured"] = False config["proxy_configured"] = False
info("Skipped. Example configs are in infrastructure/reverse-proxy-examples/") info(
"Skipped. Example configs are in infrastructure/reverse-proxy-examples/"
)
return return
if 0 <= idx < len(choices): if 0 <= idx < len(choices):
break break
@@ -916,7 +999,7 @@ def wizard_reverse_proxy(config: dict, args: argparse.Namespace) -> None:
print(f" {dim('...')}") print(f" {dim('...')}")
print() print()
custom_path = ask(f"Write config to", default=str(out_path)) custom_path = ask("Write config to", default=str(out_path))
out_path = pathlib.Path(custom_path) out_path = pathlib.Path(custom_path)
if out_path.exists(): if out_path.exists():
@@ -958,7 +1041,9 @@ def wizard_reverse_proxy(config: dict, args: argparse.Namespace) -> None:
info("Traefik watches for file changes — no reload needed.") info("Traefik watches for file changes — no reload needed.")
def wizard_telemetry(config: dict, telem: SetupTelemetry, args: argparse.Namespace) -> None: def wizard_telemetry(
config: dict, telem: SetupTelemetry, args: argparse.Namespace
) -> None:
section("Anonymous Diagnostics") section("Anonymous Diagnostics")
info("TOD can send anonymous setup and runtime diagnostics to help") info("TOD can send anonymous setup and runtime diagnostics to help")
info("identify common failures. No personal data, IPs, hostnames,") info("identify common failures. No personal data, IPs, hostnames,")
@@ -989,8 +1074,60 @@ def wizard_telemetry(config: dict, telem: SetupTelemetry, args: argparse.Namespa
info("No diagnostics will be sent.") info("No diagnostics will be sent.")
def _read_version() -> str:
"""Read the version string from the VERSION file."""
version_file = PROJECT_ROOT / "VERSION"
if version_file.exists():
return version_file.read_text().strip()
return "latest"
def wizard_build_mode(config: dict, args: argparse.Namespace) -> None:
"""Ask whether to use pre-built images or build from source."""
section("Build Mode")
version = _read_version()
config["tod_version"] = version
if args.non_interactive:
mode = getattr(args, "build_mode", None) or "prebuilt"
config["build_mode"] = mode
if mode == "source":
COMPOSE_CMD.extend(["-f", COMPOSE_BUILD_OVERRIDE])
ok(f"Build from source (v{version})")
else:
ok(f"Pre-built images from GHCR (v{version})")
return
print(f" TOD v{bold(version)} can be installed two ways:")
print()
print(f" {bold('1.')} {green('Pre-built images')} {dim('(recommended)')}")
print(" Pull ready-to-run images from GitHub Container Registry.")
print(" Fast install, no compilation needed.")
print()
print(f" {bold('2.')} Build from source")
print(" Compile Go, Python, and Node.js locally.")
print(" Requires 4+ GB RAM and takes 5-15 minutes.")
print()
while True:
choice = input(" Choice [1/2]: ").strip()
if choice in ("1", ""):
config["build_mode"] = "prebuilt"
ok("Pre-built images from GHCR")
break
elif choice == "2":
config["build_mode"] = "source"
COMPOSE_CMD.extend(["-f", COMPOSE_BUILD_OVERRIDE])
ok("Build from source")
break
else:
warn("Please enter 1 or 2.")
# ── Summary ────────────────────────────────────────────────────────────────── # ── Summary ──────────────────────────────────────────────────────────────────
def show_summary(config: dict, args: argparse.Namespace) -> bool: def show_summary(config: dict, args: argparse.Namespace) -> bool:
banner("Configuration Summary") banner("Configuration Summary")
@@ -1008,7 +1145,9 @@ def show_summary(config: dict, args: argparse.Namespace) -> bool:
print(f" {bold('Admin Account')}") print(f" {bold('Admin Account')}")
print(f" Email = {config['admin_email']}") print(f" Email = {config['admin_email']}")
print(f" Password = {'(auto-generated)' if config.get('admin_password_generated') else mask_secret(config['admin_password'])}") print(
f" Password = {'(auto-generated)' if config.get('admin_password_generated') else mask_secret(config['admin_password'])}"
)
print() print()
print(f" {bold('Email')}") print(f" {bold('Email')}")
@@ -1041,6 +1180,14 @@ def show_summary(config: dict, args: argparse.Namespace) -> bool:
print(f" TELEMETRY_ENABLED = {dim('false')}") print(f" TELEMETRY_ENABLED = {dim('false')}")
print() print()
print(f" {bold('Build Mode')}")
if config.get("build_mode") == "source":
print(" Mode = Build from source")
else:
print(f" Mode = {green('Pre-built images')}")
print(f" Version = {config.get('tod_version', 'latest')}")
print()
print(f" {bold('OpenBao')}") print(f" {bold('OpenBao')}")
print(f" {dim('(will be captured automatically during bootstrap)')}") print(f" {dim('(will be captured automatically during bootstrap)')}")
print() print()
@@ -1054,6 +1201,7 @@ def show_summary(config: dict, args: argparse.Namespace) -> bool:
# ── File writers ───────────────────────────────────────────────────────────── # ── File writers ─────────────────────────────────────────────────────────────
def write_env_prod(config: dict) -> None: def write_env_prod(config: dict) -> None:
"""Write the .env.prod file.""" """Write the .env.prod file."""
db = config["postgres_db"] db = config["postgres_db"]
@@ -1065,12 +1213,12 @@ def write_env_prod(config: dict) -> None:
smtp_block = "" smtp_block = ""
if config.get("smtp_configured"): if config.get("smtp_configured"):
smtp_block = f"""\ smtp_block = f"""\
SMTP_HOST={config['smtp_host']} SMTP_HOST={config["smtp_host"]}
SMTP_PORT={config['smtp_port']} SMTP_PORT={config["smtp_port"]}
SMTP_USER={config.get('smtp_user', '')} SMTP_USER={config.get("smtp_user", "")}
SMTP_PASSWORD={config.get('smtp_password', '')} SMTP_PASSWORD={config.get("smtp_password", "")}
SMTP_USE_TLS={'true' if config.get('smtp_tls') else 'false'} SMTP_USE_TLS={"true" if config.get("smtp_tls") else "false"}
SMTP_FROM_ADDRESS={config['smtp_from']}""" SMTP_FROM_ADDRESS={config["smtp_from"]}"""
else: else:
smtp_block = """\ smtp_block = """\
# Email not configured — re-run setup.py to add SMTP # Email not configured — re-run setup.py to add SMTP
@@ -1097,8 +1245,8 @@ APP_USER_DATABASE_URL=postgresql+asyncpg://app_user:{app_pw}@postgres:5432/{db}
POLLER_DATABASE_URL=postgres://poller_user:{poll_pw}@postgres:5432/{db}?sslmode=disable POLLER_DATABASE_URL=postgres://poller_user:{poll_pw}@postgres:5432/{db}?sslmode=disable
# --- Security --- # --- Security ---
JWT_SECRET_KEY={config['jwt_secret']} JWT_SECRET_KEY={config["jwt_secret"]}
CREDENTIAL_ENCRYPTION_KEY={config['encryption_key']} CREDENTIAL_ENCRYPTION_KEY={config["encryption_key"]}
# --- OpenBao (KMS) --- # --- OpenBao (KMS) ---
OPENBAO_ADDR=http://openbao:8200 OPENBAO_ADDR=http://openbao:8200
@@ -1106,21 +1254,22 @@ OPENBAO_TOKEN=PLACEHOLDER_RUN_SETUP
BAO_UNSEAL_KEY=PLACEHOLDER_RUN_SETUP BAO_UNSEAL_KEY=PLACEHOLDER_RUN_SETUP
# --- Admin Bootstrap --- # --- Admin Bootstrap ---
FIRST_ADMIN_EMAIL={config['admin_email']} FIRST_ADMIN_EMAIL={config["admin_email"]}
FIRST_ADMIN_PASSWORD={config['admin_password']} FIRST_ADMIN_PASSWORD={config["admin_password"]}
# --- Email --- # --- Email ---
{smtp_block} {smtp_block}
# --- Web --- # --- Web ---
APP_BASE_URL={config['app_base_url']} APP_BASE_URL={config["app_base_url"]}
CORS_ORIGINS={config['cors_origins']} CORS_ORIGINS={config["cors_origins"]}
# --- Application --- # --- Application ---
ENVIRONMENT=production ENVIRONMENT=production
LOG_LEVEL=info LOG_LEVEL=info
DEBUG=false DEBUG=false
APP_NAME=TOD - The Other Dude APP_NAME=TOD - The Other Dude
TOD_VERSION={config.get("tod_version", "latest")}
# --- Storage --- # --- Storage ---
GIT_STORE_PATH=/data/git-store GIT_STORE_PATH=/data/git-store
@@ -1151,7 +1300,7 @@ CONFIG_BACKUP_MAX_CONCURRENT=10
# --- Telemetry --- # --- Telemetry ---
# Opt-in anonymous diagnostics. Set to false to disable. # Opt-in anonymous diagnostics. Set to false to disable.
TELEMETRY_ENABLED={'true' if config.get('telemetry_enabled') else 'false'} TELEMETRY_ENABLED={"true" if config.get("telemetry_enabled") else "false"}
TELEMETRY_COLLECTOR_URL={_TELEMETRY_COLLECTOR} TELEMETRY_COLLECTOR_URL={_TELEMETRY_COLLECTOR}
""" """
@@ -1245,7 +1394,8 @@ def prepare_data_dirs() -> None:
try: try:
subprocess.run( subprocess.run(
["sudo", "chown", "-R", f"{APPUSER_UID}:{APPUSER_UID}", str(path)], ["sudo", "chown", "-R", f"{APPUSER_UID}:{APPUSER_UID}", str(path)],
check=True, timeout=10, check=True,
timeout=10,
) )
ok(f"{d} (owned by appuser via sudo)") ok(f"{d} (owned by appuser via sudo)")
except Exception: except Exception:
@@ -1261,14 +1411,17 @@ def prepare_data_dirs() -> None:
try: try:
subprocess.run( subprocess.run(
["sudo", "chmod", "-R", "777", str(path)], ["sudo", "chmod", "-R", "777", str(path)],
check=True, timeout=10, check=True,
timeout=10,
) )
ok(f"{d} (world-writable via sudo)") ok(f"{d} (world-writable via sudo)")
except Exception: except Exception:
warn(f"{d} — could not set permissions, VPN config sync may fail") warn(f"{d} — could not set permissions, VPN config sync may fail")
# Create/update WireGuard forwarding init script (always overwrite for isolation rules) # Create/update WireGuard forwarding init script (always overwrite for isolation rules)
fwd_script = PROJECT_ROOT / "docker-data/wireguard/custom-cont-init.d/10-forwarding.sh" fwd_script = (
PROJECT_ROOT / "docker-data/wireguard/custom-cont-init.d/10-forwarding.sh"
)
fwd_script.write_text("""\ fwd_script.write_text("""\
#!/bin/sh #!/bin/sh
# Enable forwarding between Docker network and WireGuard tunnel # Enable forwarding between Docker network and WireGuard tunnel
@@ -1299,8 +1452,10 @@ echo "WireGuard forwarding and tenant isolation rules applied"
# ── Docker operations ──────────────────────────────────────────────────────── # ── Docker operations ────────────────────────────────────────────────────────
def run_compose(*args, check: bool = True, capture: bool = False,
timeout: int = 600) -> subprocess.CompletedProcess: def run_compose(
*args, check: bool = True, capture: bool = False, timeout: int = 600
) -> subprocess.CompletedProcess:
"""Run a docker compose command with the prod overlay.""" """Run a docker compose command with the prod overlay."""
cmd = COMPOSE_CMD + ["--env-file", str(ENV_PROD)] + list(args) cmd = COMPOSE_CMD + ["--env-file", str(ENV_PROD)] + list(args)
return subprocess.run( return subprocess.run(
@@ -1332,8 +1487,16 @@ def bootstrap_openbao(config: dict) -> bool:
healthy = False healthy = False
while time.time() < deadline: while time.time() < deadline:
result = subprocess.run( result = subprocess.run(
["docker", "inspect", "--format", "{{.State.Health.Status}}", "tod_openbao"], [
capture_output=True, text=True, timeout=10, "docker",
"inspect",
"--format",
"{{.State.Health.Status}}",
"tod_openbao",
],
capture_output=True,
text=True,
timeout=10,
) )
status = result.stdout.strip() status = result.stdout.strip()
if status == "healthy": if status == "healthy":
@@ -1365,10 +1528,12 @@ def bootstrap_openbao(config: dict) -> bool:
# Update .env.prod # Update .env.prod
env_content = ENV_PROD.read_text() env_content = ENV_PROD.read_text()
env_content = env_content.replace("OPENBAO_TOKEN=PLACEHOLDER_RUN_SETUP", env_content = env_content.replace(
f"OPENBAO_TOKEN={root_token}") "OPENBAO_TOKEN=PLACEHOLDER_RUN_SETUP", f"OPENBAO_TOKEN={root_token}"
env_content = env_content.replace("BAO_UNSEAL_KEY=PLACEHOLDER_RUN_SETUP", )
f"BAO_UNSEAL_KEY={unseal_key}") env_content = env_content.replace(
"BAO_UNSEAL_KEY=PLACEHOLDER_RUN_SETUP", f"BAO_UNSEAL_KEY={unseal_key}"
)
ENV_PROD.write_text(env_content) ENV_PROD.write_text(env_content)
ENV_PROD.chmod(0o600) ENV_PROD.chmod(0o600)
@@ -1380,7 +1545,9 @@ def bootstrap_openbao(config: dict) -> bool:
# OpenBao was already initialized — check if .env.prod has real values # OpenBao was already initialized — check if .env.prod has real values
env_content = ENV_PROD.read_text() env_content = ENV_PROD.read_text()
if "PLACEHOLDER_RUN_SETUP" in env_content: if "PLACEHOLDER_RUN_SETUP" in env_content:
warn("Could not find credentials in logs (OpenBao may already be initialized).") warn(
"Could not find credentials in logs (OpenBao may already be initialized)."
)
warn("Check 'docker compose logs openbao' and update .env.prod manually.") warn("Check 'docker compose logs openbao' and update .env.prod manually.")
return False return False
else: else:
@@ -1388,6 +1555,38 @@ def bootstrap_openbao(config: dict) -> bool:
return True return True
def pull_images() -> bool:
"""Pull pre-built images from GHCR."""
section("Pulling Images")
info("Downloading pre-built images from GitHub Container Registry...")
print()
services = ["api", "poller", "frontend", "winbox-worker"]
for i, service in enumerate(services, 1):
info(f"[{i}/{len(services)}] Pulling {service}...")
try:
run_compose("pull", service, timeout=600)
ok(f"{service} pulled successfully")
except subprocess.CalledProcessError:
fail(f"Failed to pull {service}")
print()
warn("Check your internet connection and that the image exists.")
warn("To retry:")
info(
f" docker compose -f {COMPOSE_BASE} -f {COMPOSE_PROD} "
f"--env-file .env.prod pull {service}"
)
return False
except subprocess.TimeoutExpired:
fail(f"Pull of {service} timed out (10 min)")
return False
print()
ok("All images ready")
return True
def build_images() -> bool: def build_images() -> bool:
"""Build Docker images one at a time to avoid OOM.""" """Build Docker images one at a time to avoid OOM."""
section("Building Images") section("Building Images")
@@ -1405,7 +1604,10 @@ def build_images() -> bool:
fail(f"Failed to build {service}") fail(f"Failed to build {service}")
print() print()
warn("To retry this build:") warn("To retry this build:")
info(f" docker compose -f {COMPOSE_BASE} -f {COMPOSE_PROD} build {service}") info(
f" docker compose -f {COMPOSE_BASE} -f {COMPOSE_PROD} "
f"-f {COMPOSE_BUILD_OVERRIDE} build {service}"
)
return False return False
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
fail(f"Build of {service} timed out (15 min)") fail(f"Build of {service} timed out (15 min)")
@@ -1456,10 +1658,16 @@ def health_check(config: dict) -> None:
for container, label in list(pending.items()): for container, label in list(pending.items()):
try: try:
result = subprocess.run( result = subprocess.run(
["docker", "inspect", "--format", [
"{{if .State.Health}}{{.State.Health.Status}}{{else}}{{.State.Status}}{{end}}", "docker",
container], "inspect",
capture_output=True, text=True, timeout=5, "--format",
"{{if .State.Health}}{{.State.Health.Status}}{{else}}{{.State.Status}}{{end}}",
container,
],
capture_output=True,
text=True,
timeout=5,
) )
status = result.stdout.strip() status = result.stdout.strip()
if status in ("healthy", "running"): if status in ("healthy", "running"):
@@ -1491,7 +1699,7 @@ def health_check(config: dict) -> None:
if config.get("admin_password_generated"): if config.get("admin_password_generated"):
print(f" Password: {bold(config['admin_password'])}") print(f" Password: {bold(config['admin_password'])}")
else: else:
print(f" Password: (the password you entered)") print(" Password: (the password you entered)")
print() print()
info("Change the admin password after your first login.") info("Change the admin password after your first login.")
else: else:
@@ -1501,6 +1709,7 @@ def health_check(config: dict) -> None:
# ── Main ───────────────────────────────────────────────────────────────────── # ── Main ─────────────────────────────────────────────────────────────────────
def _timed(telem: SetupTelemetry, step_name: str, func, *args, **kwargs): def _timed(telem: SetupTelemetry, step_name: str, func, *args, **kwargs):
"""Run func, emit a telemetry event with timing. Returns func's result.""" """Run func, emit a telemetry event with timing. Returns func's result."""
t0 = time.monotonic() t0 = time.monotonic()
@@ -1512,8 +1721,11 @@ def _timed(telem: SetupTelemetry, step_name: str, func, *args, **kwargs):
except Exception as e: except Exception as e:
duration_ms = int((time.monotonic() - t0) * 1000) duration_ms = int((time.monotonic() - t0) * 1000)
telem.step( telem.step(
step_name, "failure", duration_ms=duration_ms, step_name,
error_message=str(e), error_code=type(e).__name__, "failure",
duration_ms=duration_ms,
error_message=str(e),
error_code=type(e).__name__,
) )
raise raise
@@ -1525,41 +1737,93 @@ def _build_parser() -> argparse.ArgumentParser:
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
) )
parser.add_argument( parser.add_argument(
"--non-interactive", action="store_true", "--non-interactive",
action="store_true",
help="Skip all prompts, use defaults + provided flags", help="Skip all prompts, use defaults + provided flags",
) )
parser.add_argument("--postgres-password", type=str, default=None, parser.add_argument(
help="PostgreSQL superuser password") "--postgres-password",
parser.add_argument("--admin-email", type=str, default=None, type=str,
help="Admin email (default: admin@the-other-dude.dev)") default=None,
parser.add_argument("--admin-password", type=str, default=None, help="PostgreSQL superuser password",
help="Admin password (auto-generated if not provided)") )
parser.add_argument("--domain", type=str, default=None, parser.add_argument(
help="Production domain (e.g. tod.example.com)") "--admin-email",
parser.add_argument("--smtp-host", type=str, default=None, type=str,
help="SMTP host (skip email config if not provided)") default=None,
parser.add_argument("--smtp-port", type=str, default=None, help="Admin email (default: admin@the-other-dude.dev)",
help="SMTP port (default: 587)") )
parser.add_argument("--smtp-user", type=str, default=None, parser.add_argument(
help="SMTP username") "--admin-password",
parser.add_argument("--smtp-password", type=str, default=None, type=str,
help="SMTP password") default=None,
parser.add_argument("--smtp-from", type=str, default=None, help="Admin password (auto-generated if not provided)",
help="SMTP from address") )
parser.add_argument("--smtp-tls", action="store_true", default=False, parser.add_argument(
help="Use TLS for SMTP (default: true in non-interactive)") "--domain",
parser.add_argument("--no-smtp-tls", action="store_true", default=False, type=str,
help="Disable TLS for SMTP") default=None,
parser.add_argument("--no-https", action="store_true", default=False, help="Production domain (e.g. tod.example.com)",
help="Use HTTP instead of HTTPS (for LAN/dev without TLS)") )
parser.add_argument("--proxy", type=str, default=None, parser.add_argument(
help="Reverse proxy type: caddy, nginx, apache, haproxy, traefik, skip") "--smtp-host",
parser.add_argument("--telemetry", action="store_true", default=False, type=str,
help="Enable anonymous diagnostics") default=None,
parser.add_argument("--no-telemetry", action="store_true", default=False, help="SMTP host (skip email config if not provided)",
help="Disable anonymous diagnostics") )
parser.add_argument("--yes", "-y", action="store_true", default=False, parser.add_argument(
help="Auto-confirm summary (don't prompt for confirmation)") "--smtp-port", type=str, default=None, help="SMTP port (default: 587)"
)
parser.add_argument("--smtp-user", type=str, default=None, help="SMTP username")
parser.add_argument("--smtp-password", type=str, default=None, help="SMTP password")
parser.add_argument("--smtp-from", type=str, default=None, help="SMTP from address")
parser.add_argument(
"--smtp-tls",
action="store_true",
default=False,
help="Use TLS for SMTP (default: true in non-interactive)",
)
parser.add_argument(
"--no-smtp-tls", action="store_true", default=False, help="Disable TLS for SMTP"
)
parser.add_argument(
"--no-https",
action="store_true",
default=False,
help="Use HTTP instead of HTTPS (for LAN/dev without TLS)",
)
parser.add_argument(
"--proxy",
type=str,
default=None,
help="Reverse proxy type: caddy, nginx, apache, haproxy, traefik, skip",
)
parser.add_argument(
"--telemetry",
action="store_true",
default=False,
help="Enable anonymous diagnostics",
)
parser.add_argument(
"--no-telemetry",
action="store_true",
default=False,
help="Disable anonymous diagnostics",
)
parser.add_argument(
"--build-mode",
type=str,
default=None,
choices=["prebuilt", "source"],
help="Image source: prebuilt (pull from GHCR) or source (compile locally)",
)
parser.add_argument(
"--yes",
"-y",
action="store_true",
default=False,
help="Auto-confirm summary (don't prompt for confirmation)",
)
return parser return parser
@@ -1575,15 +1839,20 @@ def main() -> int:
def handle_sigint(sig, frame): def handle_sigint(sig, frame):
nonlocal env_written nonlocal env_written
telem.step("setup_total", "failure", telem.step(
duration_ms=int((time.monotonic() - setup_start) * 1000), "setup_total",
error_message="User cancelled (SIGINT)") "failure",
duration_ms=int((time.monotonic() - setup_start) * 1000),
error_message="User cancelled (SIGINT)",
)
print() print()
if not env_written: if not env_written:
info("Aborted before writing .env.prod — no files changed.") info("Aborted before writing .env.prod — no files changed.")
else: else:
warn(f".env.prod was already written to {ENV_PROD}") warn(f".env.prod was already written to {ENV_PROD}")
info("OpenBao tokens may still be placeholders if bootstrap didn't complete.") info(
"OpenBao tokens may still be placeholders if bootstrap didn't complete."
)
sys.exit(1) sys.exit(1)
signal.signal(signal.SIGINT, handle_sigint) signal.signal(signal.SIGINT, handle_sigint)
@@ -1602,6 +1871,7 @@ def main() -> int:
# Phase 2: Wizard # Phase 2: Wizard
try: try:
wizard_build_mode(config, args)
wizard_database(config, args) wizard_database(config, args)
wizard_security(config) wizard_security(config)
wizard_admin(config, args) wizard_admin(config, args)
@@ -1610,16 +1880,20 @@ def main() -> int:
wizard_reverse_proxy(config, args) wizard_reverse_proxy(config, args)
telem.step("wizard", "success") telem.step("wizard", "success")
except Exception as e: except Exception as e:
telem.step("wizard", "failure", telem.step(
error_message=str(e), error_code=type(e).__name__) "wizard", "failure", error_message=str(e), error_code=type(e).__name__
)
raise raise
# Summary # Summary
if not show_summary(config, args): if not show_summary(config, args):
info("Setup cancelled.") info("Setup cancelled.")
telem.step("setup_total", "failure", telem.step(
duration_ms=int((time.monotonic() - setup_start) * 1000), "setup_total",
error_message="User cancelled at summary") "failure",
duration_ms=int((time.monotonic() - setup_start) * 1000),
error_message="User cancelled at summary",
)
return 1 return 1
# Phase 3: Write files and prepare directories # Phase 3: Write files and prepare directories
@@ -1631,8 +1905,9 @@ def main() -> int:
prepare_data_dirs() prepare_data_dirs()
telem.step("write_config", "success") telem.step("write_config", "success")
except Exception as e: except Exception as e:
telem.step("write_config", "failure", telem.step(
error_message=str(e), error_code=type(e).__name__) "write_config", "failure", error_message=str(e), error_code=type(e).__name__
)
raise raise
# Phase 4: OpenBao # Phase 4: OpenBao
@@ -1642,36 +1917,63 @@ def main() -> int:
if bao_ok: if bao_ok:
telem.step("openbao_bootstrap", "success", duration_ms=duration_ms) telem.step("openbao_bootstrap", "success", duration_ms=duration_ms)
else: else:
telem.step("openbao_bootstrap", "failure", duration_ms=duration_ms, telem.step(
error_message="OpenBao did not become healthy or credentials not found") "openbao_bootstrap",
if not ask_yes_no("Continue without OpenBao credentials? (stack will need manual fix)", default=False): "failure",
duration_ms=duration_ms,
error_message="OpenBao did not become healthy or credentials not found",
)
if not ask_yes_no(
"Continue without OpenBao credentials? (stack will need manual fix)",
default=False,
):
warn("Fix OpenBao credentials in .env.prod and re-run setup.py.") warn("Fix OpenBao credentials in .env.prod and re-run setup.py.")
telem.step("setup_total", "failure", telem.step(
duration_ms=int((time.monotonic() - setup_start) * 1000), "setup_total",
error_message="Aborted after OpenBao failure") "failure",
duration_ms=int((time.monotonic() - setup_start) * 1000),
error_message="Aborted after OpenBao failure",
)
return 1 return 1
# Phase 5: Build # Phase 5: Build or Pull
t0 = time.monotonic() t0 = time.monotonic()
if not build_images(): if config.get("build_mode") == "source":
images_ok = build_images()
step_name = "build_images"
fail_msg = "Docker build failed"
retry_hint = "Fix the build error and re-run setup.py to continue."
else:
images_ok = pull_images()
step_name = "pull_images"
fail_msg = "Image pull failed"
retry_hint = "Check your connection and re-run setup.py to continue."
if not images_ok:
duration_ms = int((time.monotonic() - t0) * 1000) duration_ms = int((time.monotonic() - t0) * 1000)
telem.step("build_images", "failure", duration_ms=duration_ms) telem.step(step_name, "failure", duration_ms=duration_ms)
warn("Fix the build error and re-run setup.py to continue.") warn(retry_hint)
telem.step("setup_total", "failure", telem.step(
duration_ms=int((time.monotonic() - setup_start) * 1000), "setup_total",
error_message="Docker build failed") "failure",
duration_ms=int((time.monotonic() - setup_start) * 1000),
error_message=fail_msg,
)
return 1 return 1
duration_ms = int((time.monotonic() - t0) * 1000) duration_ms = int((time.monotonic() - t0) * 1000)
telem.step("build_images", "success", duration_ms=duration_ms) telem.step(step_name, "success", duration_ms=duration_ms)
# Phase 6: Start # Phase 6: Start
t0 = time.monotonic() t0 = time.monotonic()
if not start_stack(): if not start_stack():
duration_ms = int((time.monotonic() - t0) * 1000) duration_ms = int((time.monotonic() - t0) * 1000)
telem.step("start_stack", "failure", duration_ms=duration_ms) telem.step("start_stack", "failure", duration_ms=duration_ms)
telem.step("setup_total", "failure", telem.step(
duration_ms=int((time.monotonic() - setup_start) * 1000), "setup_total",
error_message="Stack failed to start") "failure",
duration_ms=int((time.monotonic() - setup_start) * 1000),
error_message="Stack failed to start",
)
return 1 return 1
duration_ms = int((time.monotonic() - t0) * 1000) duration_ms = int((time.monotonic() - t0) * 1000)
telem.step("start_stack", "success", duration_ms=duration_ms) telem.step("start_stack", "success", duration_ms=duration_ms)