--- phase: 09-retention-cleanup plan: 01 subsystem: database tags: [apscheduler, retention, postgresql, prometheus, cascade-delete] # Dependency graph requires: - phase: 01-database-schema provides: router_config_snapshots table with CASCADE FK constraints provides: - Automatic retention cleanup of expired config snapshots - CONFIG_RETENTION_DAYS env var for configurable retention period - Prometheus metrics for cleanup observability affects: [] # Tech tracking tech-stack: added: [] patterns: [APScheduler IntervalTrigger for periodic maintenance jobs] key-files: created: - backend/app/services/retention_service.py - backend/tests/test_retention_service.py modified: - backend/app/config.py - backend/app/main.py key-decisions: - "make_interval(days => :days) for parameterized PostgreSQL interval (no string concatenation)" - "24h IntervalTrigger with 1h jitter to stagger cleanup across instances" - "AdminAsyncSessionLocal (bypasses RLS) since retention is cross-tenant system operation" patterns-established: - "IntervalTrigger pattern for periodic maintenance jobs (vs CronTrigger for scheduled backups)" requirements-completed: [STOR-03, STOR-04] # Metrics duration: 2min completed: 2026-03-13 --- # Phase 9 Plan 1: Retention Cleanup Summary **Daily APScheduler job deletes config snapshots older than CONFIG_RETENTION_DAYS (default 90) with CASCADE FK cleanup of diffs and changes** ## Performance - **Duration:** 2 min - **Started:** 2026-03-13T04:31:48Z - **Completed:** 2026-03-13T04:34:12Z - **Tasks:** 2 - **Files modified:** 4 ## Accomplishments - Retention service with parameterized SQL DELETE using make_interval for safe interval binding - APScheduler IntervalTrigger running every 24h with 1h jitter for stagger - Prometheus counter and histogram for cleanup observability - Wired into main.py lifespan with non-fatal startup pattern ## Task Commits Each task was committed atomically: 1. **Task 1 (RED): Add failing tests** - `00bdde9` (test) 2. **Task 1 (GREEN): Implement retention service + config setting** - `a9f7a45` (feat) 3. **Task 2: Wire retention scheduler into lifespan** - `4d62bc9` (feat) ## Files Created/Modified - `backend/app/services/retention_service.py` - Retention cleanup logic, scheduler, Prometheus metrics - `backend/tests/test_retention_service.py` - 4 unit tests for cleanup function - `backend/app/config.py` - Added CONFIG_RETENTION_DAYS setting (default 90) - `backend/app/main.py` - Wired start/stop retention scheduler into lifespan ## Decisions Made - Used make_interval(days => :days) for parameterized PostgreSQL interval (avoids string concatenation SQL injection risk) - 24h IntervalTrigger with 1h jitter to stagger cleanup across instances - AdminAsyncSessionLocal bypasses RLS since retention is a cross-tenant system operation ## Deviations from Plan None - plan executed exactly as written. ## Issues Encountered None ## User Setup Required None - no external service configuration required. CONFIG_RETENTION_DAYS defaults to 90 if not set. ## Next Phase Readiness - Retention cleanup is fully operational, ready for phase 10 - No blockers --- *Phase: 09-retention-cleanup* *Completed: 2026-03-13*