PR Preview Cleanup Improvements¶

Summary¶

Analysis of PR preview environment cleanup process, identifying gaps and proposing improvements for deletion ordering and resource cleanup.

Current Cleanup Flow¶

1. PR closed → GitHub Actions cleanup workflow triggers
2. Workflow deletes RabbitMQ vhost (direct API call)
3. Workflow deletes pr-{n}/ directory from cluster-gitops
4. ArgoCD ApplicationSet removes all Applications for PR
5. Each Application's finalizer deletes its resources
6. Kubernetes cascade-deletes everything when namespace is deleted

Resource Inventory¶

Managed by pr-{n}-namespace Application¶

Resource	External System	Cleanup Behavior
Namespace	-	Cascade-deletes all contained resources
ExternalSecret	GCP Secret Manager	Secret synced, no external cleanup needed
Secret (mongodb-pr-password)	-	Deleted with namespace
AtlasDatabaseUser CR	MongoDB Atlas	Atlas Operator deletes user from Atlas
Connection Secret	-	Created by Atlas Operator, deleted with namespace
ServiceAccount, Role, RoleBinding	-	Deleted with namespace
Job (db-reset)	-	Deleted with namespace
ConfigMap (db-reset-marker)	-	Deleted with namespace

Managed by Service Applications¶

Resource	External System	Cleanup Behavior
Deployments, Services	-	Deleted with namespace
Ingress	external-dns, cert-manager	DNS records removed, certs cleaned
ServiceAccounts	-	Deleted with namespace

External Resources (Not Kubernetes-managed)¶

Resource	Current Cleanup	Gap
RabbitMQ vhost	✅ Workflow deletes before git push	None
MongoDB database (`syrf_pr_{n}`)	❌ NOT DELETED	Data orphaned in Atlas

Identified Issues¶

Issue 1: Orphaned MongoDB Databases (HIGH PRIORITY)¶

Problem: When a PR closes, the AtlasDatabaseUser is deleted (removing access), but the database itself (syrf_pr_{n}) and all its collections remain in MongoDB Atlas indefinitely.

Impact: - Data accumulates over time - Potential storage costs - Security concern (orphaned data)

Solution: Add database cleanup to workflow before deleting cluster-gitops files.

- name: Drop MongoDB database
  env:
    MONGO_ADMIN_URI: ${{ secrets.MONGO_ADMIN_URI }}
  run: |
    mongosh "$MONGO_ADMIN_URI" --eval "
      const dbName = 'syrf_pr_${PR_NUM}';
      db.getSiblingDB(dbName).dropDatabase();
      print('Dropped database ' + dbName);
    "

Issue 2: No Guaranteed Deletion Order¶

Problem: ArgoCD ApplicationSet doesn't guarantee deletion order when removing Applications. All apps for a PR may be deleted simultaneously.

Potential Effects: - Services may lose MongoDB connectivity before graceful shutdown - DNS records might not be cleaned if Ingress is cascade-deleted before external-dns finalizer runs

Assessment: For ephemeral PR previews, this is acceptable. Services don't need graceful shutdown, and DNS cleanup usually works.

If stricter ordering is needed in future, options include: 1. Sequential deletion in workflow (delete services, wait, delete namespace) 2. App-of-Apps pattern with sync waves 3. Custom controller with finalizer ordering

Issue 3: Potential DNS Record Orphaning¶

Problem: If namespace cascade-deletes Ingress before external-dns processes the deletion, DNS records may not be removed.

Assessment: Low probability, but worth monitoring. external-dns has reconciliation that should eventually clean up.

Mitigation: Periodic audit of DNS records vs active PR previews.

Deletion Order Considerations¶

Why Order Could Matter¶

Scenario	If Deleted First	Impact
AtlasDatabaseUser	Services lose DB auth	Log errors (harmless for PRs)
Namespace	Everything cascade-deleted	Fast but potentially messy
Services	Clean pod termination	Ideal but slower
Ingress	DNS/cert cleanup may race	Usually fine

ArgoCD Sync Waves and Deletion¶

Important: Sync waves on Applications affect sync order, not deletion order when ApplicationSet removes them.

During creation: wave -1 syncs before wave 0
During deletion: order is NOT guaranteed by waves

For controlled deletion order, would need: - App-of-Apps pattern (waves affect child app processing) - Sequential deletion in workflow - Custom controller

Recommendations¶

Short-term (Implement Now)¶

Add MongoDB database cleanup to workflow
Priority: HIGH
Effort: LOW (simple mongosh command)
Prevents data accumulation

Medium-term (Consider for Future)¶

Monitor DNS record cleanup
Add periodic audit script
Alert on orphaned records
Document cleanup expectations
Set expectations that PR cleanup is "best effort"
Not production-grade graceful shutdown

Long-term (If Needed)¶

Implement sequential deletion (only if issues arise)
More complex workflow
Slower cleanup
Only worth it if current approach causes problems

Implementation Notes¶

MongoDB Cleanup Requirements¶

To drop a PR database, need: - MongoDB connection string with admin privileges - Access to run dropDatabase() on syrf_pr_{n}

Options: 1. Use Atlas admin API (REST call) 2. Use mongosh with admin connection string 3. Create a cleanup Job in Kubernetes (similar to db-reset)

Workflow Integration Point¶

MongoDB cleanup should happen: - AFTER RabbitMQ vhost deletion - BEFORE deleting cluster-gitops files - While AtlasDatabaseUser still exists (has credentials)

cleanup-pr-preview:
  steps:
    - name: Delete RabbitMQ vhost
      # ... existing ...

    - name: Drop MongoDB database  # NEW
      # ... mongosh command ...

    - name: Delete cluster-gitops files
      # ... existing ...

Decision Log¶

Date	Decision	Rationale
2026-01-15	Accept cascade deletion for PR previews	Ephemeral environments don't need graceful shutdown
2026-01-15	Prioritize MongoDB cleanup	Real data accumulation issue vs theoretical ordering issues