Files
the-other-dude/docs/superpowers/specs/2026-03-14-setup-script-design.md
Jason Staack c7c9f4d71e docs: add setup script implementation plan
7-task plan covering database rename, login page fix, setup.py
wizard with OpenBao bootstrap, sequential builds, and health checks.
Also fixes spec OpenBao timeout to 60s.
2026-03-14 09:52:58 -05:00

250 lines
9.8 KiB
Markdown

# TOD Production Setup Script — Design Spec
## Overview
An interactive Python setup wizard (`setup.py`) that walks a sysadmin through configuring and deploying TOD (The Other Dude) for production. The script minimizes manual configuration by auto-generating secrets, capturing OpenBao credentials automatically, building images sequentially, and verifying service health.
**Target audience:** Technical sysadmins unfamiliar with this specific project.
## Design Decisions
- **Python 3.10+** — already required by the stack, enables rich input handling and colored output.
- **Linear wizard with opinionated defaults** — grouped sections, auto-generate everything possible, only prompt for genuine human decisions.
- **Integrated OpenBao bootstrap** — script starts the OpenBao container, captures unseal key and root token, updates `.env.prod` automatically (no manual copy-paste).
- **Sequential image builds** — builds api, poller, frontend, winbox-worker one at a time to avoid OOM on low-RAM machines.
- **Re-runnable** — safe to run again; detects existing `.env.prod` and offers to overwrite, back up (`.env.prod.backup.<ISO timestamp>`), or abort.
## Prerequisite: Database Rename
The codebase currently uses `mikrotik` as the database name. Before the setup script can use `tod`, these files must be updated:
- `docker-compose.yml` — default `POSTGRES_DB` and healthcheck (`pg_isready -d`)
- `docker-compose.prod.yml` — hardcoded poller `DATABASE_URL` (change to `${POLLER_DATABASE_URL}`)
- `docker-compose.staging.yml` — if applicable
- `scripts/init-postgres.sql``GRANT CONNECT ON DATABASE` statements
- `.env.example` — all URL references
The setup script will use `POSTGRES_DB=tod`. These file changes are part of the implementation, not runtime.
Additionally, `docker-compose.prod.yml` hardcodes the poller's `DATABASE_URL`. This must be changed to `DATABASE_URL: ${POLLER_DATABASE_URL}` so the setup script's generated value is used.
## Script Flow
### Phase 1: Pre-flight Checks
- Verify Python 3.10+
- Verify Docker Engine and Docker Compose v2 are installed and the daemon is running
- Check for existing `.env.prod` — if found, offer: overwrite / back up and create new / abort
- Warn if less than 4GB RAM available
- Check if key ports are in use (5432, 6379, 4222, 8001, 3000, 51820) and warn
### Phase 2: Interactive Configuration (Linear Wizard)
Six sections, presented in order:
#### 2.1 Database
| Prompt | Default | Notes |
|--------|---------|-------|
| PostgreSQL superuser password | (required, no default) | Validated non-empty, min 12 chars |
Auto-generated:
- `POSTGRES_DB=tod`
- `app_user` password via `secrets.token_urlsafe(24)` (yields ~32 base64 chars)
- `poller_user` password via `secrets.token_urlsafe(24)` (yields ~32 base64 chars)
- `DATABASE_URL=postgresql+asyncpg://postgres:<pw>@postgres:5432/tod`
- `SYNC_DATABASE_URL=postgresql+psycopg2://postgres:<pw>@postgres:5432/tod`
- `APP_USER_DATABASE_URL=postgresql+asyncpg://app_user:<app_pw>@postgres:5432/tod`
- `POLLER_DATABASE_URL=postgres://poller_user:<poller_pw>@postgres:5432/tod`
#### 2.2 Security
No prompts. Auto-generated:
- `JWT_SECRET_KEY` via `secrets.token_urlsafe(64)` (yields ~86 base64 chars)
- `CREDENTIAL_ENCRYPTION_KEY` via `base64(secrets.token_bytes(32))` (yields 44 base64 chars)
- Display both values to the user with a "save these somewhere safe" note
#### 2.3 Admin Account
| Prompt | Default | Notes |
|--------|---------|-------|
| Admin email | `admin@the-other-dude.dev` | Validated as email-like |
| Admin password | (enter or press Enter to generate) | Min 12 chars if manual; generated passwords are 24 chars |
#### 2.4 Email (Optional)
| Prompt | Default | Notes |
|--------|---------|-------|
| Configure SMTP now? | No | If no, skip with reminder |
| SMTP host | (required if yes) | |
| SMTP port | 587 | |
| SMTP username | (optional) | |
| SMTP password | (optional) | |
| From address | (required if yes) | |
| Use TLS? | Yes | |
#### 2.5 Web / Domain
| Prompt | Default | Notes |
|--------|---------|-------|
| Production domain | (required) | e.g. `tod.staack.com` |
Auto-derived:
- `APP_BASE_URL=https://<domain>`
- `CORS_ORIGINS=https://<domain>`
#### 2.6 Summary & Confirmation
Display all settings grouped by section. Secrets are partially masked (first 8 chars + `...`). Ask for confirmation before writing.
### Phase 3: Write `.env.prod`
Write the file with section comments and timestamp header. Also generate `scripts/init-postgres-prod.sql` with the generated `app_user` and `poller_user` passwords baked in (PostgreSQL init scripts don't support env var substitution).
Format:
```bash
# ============================================================
# TOD Production Environment — generated by setup.py
# Generated: <ISO timestamp>
# ============================================================
# --- Database ---
POSTGRES_DB=tod
POSTGRES_USER=postgres
POSTGRES_PASSWORD=<input>
DATABASE_URL=postgresql+asyncpg://postgres:<pw>@postgres:5432/tod
SYNC_DATABASE_URL=postgresql+psycopg2://postgres:<pw>@postgres:5432/tod
APP_USER_DATABASE_URL=postgresql+asyncpg://app_user:<app_pw>@postgres:5432/tod
POLLER_DATABASE_URL=postgres://poller_user:<poller_pw>@postgres:5432/tod
# --- Security ---
JWT_SECRET_KEY=<generated>
CREDENTIAL_ENCRYPTION_KEY=<generated>
# --- OpenBao (KMS) ---
OPENBAO_ADDR=http://openbao:8200
OPENBAO_TOKEN=PLACEHOLDER_RUN_SETUP
BAO_UNSEAL_KEY=PLACEHOLDER_RUN_SETUP
# --- Admin Bootstrap ---
FIRST_ADMIN_EMAIL=<input>
FIRST_ADMIN_PASSWORD=<input-or-generated>
# --- Email ---
# <configured block or "unconfigured" note>
SMTP_HOST=
SMTP_PORT=587
SMTP_USER=
SMTP_PASSWORD=
SMTP_USE_TLS=true
SMTP_FROM_ADDRESS=noreply@example.com
# --- Web ---
APP_BASE_URL=https://<domain>
CORS_ORIGINS=https://<domain>
# --- Application ---
ENVIRONMENT=production
LOG_LEVEL=info
DEBUG=false
APP_NAME=TOD - The Other Dude
# --- Storage ---
GIT_STORE_PATH=/data/git-store
FIRMWARE_CACHE_DIR=/data/firmware-cache
WIREGUARD_CONFIG_PATH=/data/wireguard
WIREGUARD_GATEWAY=wireguard
CONFIG_RETENTION_DAYS=90
# --- Redis & NATS ---
REDIS_URL=redis://redis:6379/0
NATS_URL=nats://nats:4222
# --- Poller ---
POLL_INTERVAL_SECONDS=60
CONNECTION_TIMEOUT_SECONDS=10
COMMAND_TIMEOUT_SECONDS=30
# --- Remote Access ---
TUNNEL_PORT_MIN=49000
TUNNEL_PORT_MAX=49100
TUNNEL_IDLE_TIMEOUT=300
SSH_RELAY_PORT=8080
SSH_IDLE_TIMEOUT=900
# --- Config Backup ---
CONFIG_BACKUP_INTERVAL=21600
CONFIG_BACKUP_MAX_CONCURRENT=10
```
### Phase 4: OpenBao Bootstrap
1. Start postgres and openbao containers only: `docker compose -f docker-compose.yml -f docker-compose.prod.yml --env-file .env.prod up -d postgres openbao`
2. Wait for openbao container to be healthy (timeout 60s)
3. Run `docker compose logs openbao 2>&1` and parse the `OPENBAO_TOKEN=` and `BAO_UNSEAL_KEY=` lines using regex (init.sh prints these to stdout during container startup, which is captured in Docker logs)
4. Update `.env.prod` by replacing the `PLACEHOLDER_RUN_SETUP` values with the captured credentials
5. On failure: `.env.prod` retains placeholders, print instructions for manual capture via `docker compose logs openbao`
### Phase 5: Build Images
Build sequentially to avoid OOM:
```
docker compose -f docker-compose.yml -f docker-compose.prod.yml build api
docker compose -f docker-compose.yml -f docker-compose.prod.yml build poller
docker compose -f docker-compose.yml -f docker-compose.prod.yml build frontend
docker compose -f docker-compose.yml -f docker-compose.prod.yml build winbox-worker
```
Show progress for each. On failure: stop, report which image failed, suggest rerunning.
### Phase 6: Start Stack
```
docker compose -f docker-compose.yml -f docker-compose.prod.yml --env-file .env.prod up -d
```
### Phase 7: Health Check
- Poll service health for up to 60 seconds
- Report status of: postgres, redis, nats, openbao, api, poller, frontend, winbox-worker
- On success: print access URL (`https://<domain>`) and admin credentials
- On timeout: report which services are unhealthy, suggest `docker compose logs <service>`
## Database Init Script
`scripts/init-postgres.sql` hardcodes `app_password` and `poller_password`. Since PostgreSQL's `docker-entrypoint-initdb.d` scripts don't support environment variable substitution, the setup script generates `scripts/init-postgres-prod.sql` with the actual generated passwords baked in. The docker-compose.prod.yml volume mount will be updated to use this file instead.
## Login Page Fix
`frontend/src/routes/login.tsx` lines 235-241 contain a "First time?" hint showing `.env` credential names. This will be wrapped in `{import.meta.env.DEV && (...)}` so it only appears in development builds. Vite's production build strips DEV-gated code entirely.
## Error Handling
| Scenario | Behavior |
|----------|----------|
| Docker not installed/running | Fail early with clear message |
| Existing `.env.prod` | Offer: overwrite / back up / abort |
| Port already in use | Warn (non-blocking) with which port and likely culprit |
| OpenBao init fails | `.env.prod` retains placeholders, print manual capture steps |
| Image build fails | Stop, show failed image, suggest retry command |
| Health check timeout (60s) | Report unhealthy services, suggest log commands |
| Ctrl+C before Phase 3 | Graceful exit, no files written |
| Ctrl+C during/after Phase 3 | `.env.prod` exists (possibly with placeholders), noted on exit |
## Re-runnability
- Detects existing `.env.prod` and offers choices
- Won't regenerate secrets if valid ones exist (offers to keep or regenerate)
- OpenBao re-init is idempotent (init.sh handles already-initialized state)
- Image rebuilds are safe (Docker layer caching)
- Backup naming: `.env.prod.backup.<ISO timestamp>`
## Dependencies
- Python 3.10+ (stdlib only — no pip packages required)
- Docker Engine 24+
- Docker Compose v2
- Stdlib modules: `secrets`, `subprocess`, `shutil`, `json`, `re`, `datetime`, `pathlib`, `getpass`, `socket` (for port checks)