MongoDB Permissions for Data Snapshot Automation¶
This document explains how MongoDB permissions work for the snapshot/restore feature, clarifying common misconceptions and documenting the actual permission model.
Critical Misconception: No Wildcard Database Permissions¶
⚠️ MongoDB does NOT support wildcard patterns for database names in role grants.
You cannot do this:
// ❌ THIS IS NOT POSSIBLE
db.grantRolesToUser("cleanup-user", [
{ role: "dbAdmin", db: "syrf_pr_*" } // INVALID - wildcards don't work
])
Each role must be granted on a specific, named database or using cluster-wide built-in roles.
MongoDB Role Scoping Options¶
Option 1: Specific Database Roles¶
Roles are granted per database. Each database must be explicitly named:
// ✅ Valid - explicit database names
db.grantRolesToUser("pr-123-user", [
{ role: "readWrite", db: "syrf_pr_123" }, // Specific to PR 123
{ role: "dbAdmin", db: "syrf_pr_123" }, // Can drop its own DB
{ role: "read", db: "syrf_snapshot" } // Read from snapshot
])
Implication: You must create/update permissions for each new PR database. This is what the Atlas Operator does automatically.
Option 2: Cluster-Wide Built-in Roles¶
MongoDB has cluster-wide admin roles that work on ALL databases:
| Role | Access Level | Databases Affected |
|---|---|---|
readAnyDatabase |
Read | ALL databases |
readWriteAnyDatabase |
Read/Write | ALL databases |
dbAdminAnyDatabase |
Admin (drop, create, etc.) | ALL databases |
userAdminAnyDatabase |
User management | ALL databases |
// ✅ Valid - cluster-wide role
db.grantRolesToUser("admin-user", [
{ role: "dbAdminAnyDatabase", db: "admin" } // Can drop ANY database
])
⚠️ WARNING: These roles provide access to ALL databases including production! Use with extreme caution.
Users in the Data Snapshot System¶
1. Snapshot Producer User¶
Purpose: Weekly CronJob that copies syrftest → syrf_snapshot
| Database | Role | Purpose |
|---|---|---|
syrftest |
read |
Read production data (read-only) |
syrf_snapshot |
readWrite |
Write snapshot data |
syrf_snapshot |
dbAdmin |
Drop collections before refresh |
// Created manually in Atlas Console
{
"user": "snapshot-producer",
"roles": [
{ "role": "read", "db": "syrftest" },
{ "role": "readWrite", "db": "syrf_snapshot" },
{ "role": "dbAdmin", "db": "syrf_snapshot" }
]
}
Security: Cannot write to production. Even a bug cannot corrupt syrftest.
2. PR-Specific Users (via Atlas Operator)¶
Purpose: Each PR gets an isolated user with access only to its database
| Database | Role | Purpose |
|---|---|---|
syrf_pr_N |
readWrite |
Application read/write |
syrf_pr_N |
dbAdmin |
Can drop its own database |
syrf_snapshot |
read |
Read from snapshot for restore |
# Created automatically by Atlas Operator for each PR
apiVersion: atlas.mongodb.com/v1
kind: AtlasDatabaseUser
spec:
username: syrf-pr-123-user
roles:
- roleName: readWrite
databaseName: syrf_pr_123
- roleName: dbAdmin
databaseName: syrf_pr_123
- roleName: read
databaseName: syrf_snapshot # When use-snapshot label present
Security: Each PR user is completely isolated. Cannot access other PR databases or production.
3. Staging User (Existing)¶
Purpose: Application access for staging environment
| Database | Role | Purpose |
|---|---|---|
syrf_staging |
readWrite |
Application access |
syrf_staging |
dbAdmin |
Schema management |
⚠️ Problem: Staging user does NOT have access to syrf_pr_* databases. It cannot drop PR databases.
The Cleanup Problem¶
When a PR is closed, who drops the syrf_pr_N database?
Challenge¶
- The PR user (
syrf-pr-N-user) hasdbAdminon its own database and CAN drop it - BUT the user is managed by Atlas Operator via a CRD in the PR namespace
- When the PR closes, the namespace (and CRD) gets deleted
- Atlas Operator then deletes the MongoDB user
- Race condition: Can we drop the database before the user is deleted?
Option A: PreDelete Hook with PR User (Current Approach)¶
The PR user drops its own database before ArgoCD deletes the namespace:
# ArgoCD PreDelete hook - runs BEFORE namespace deletion
apiVersion: batch/v1
kind: Job
metadata:
annotations:
argocd.argoproj.io/hook: PreDelete
spec:
template:
spec:
containers:
- name: cleanup
image: mongo:7.0
command: ["/bin/bash", "-c"]
args:
- |
mongosh "$MONGODB_URI" --eval "
db.getSiblingDB('syrf_pr_${PR_NUM}').dropDatabase();
"
env:
- name: MONGODB_URI
valueFrom:
secretKeyRef:
name: mongodb-credentials # PR user's credentials
key: connectionString
Why it works:
1. PreDelete hook runs FIRST (before any resource deletion)
2. PR user credentials still exist at this point
3. PR user has dbAdmin on syrf_pr_N (its own DB only)
4. Database is dropped
5. THEN ArgoCD deletes the namespace
6. THEN Atlas Operator cleans up the MongoDB user
Pros: - Least privilege - each user only drops its own database - No cluster-wide admin access needed
Cons: - If hook fails, database is orphaned - Requires ArgoCD to respect hook ordering
Option B: Dedicated Cleanup User with dbAdminAnyDatabase¶
Create a powerful service account for cleanup operations:
Pros: - Can clean up any orphan database - Works as fallback if PreDelete hooks fail
Cons:
- EXTREME RISK: Can drop production (syrftest) or any other database!
- Requires defense-in-depth: script validation, allowlists, etc.
Mandatory Safety Script:
# MUST validate before ANY operation
validate_target_db() {
local target="$1"
# ONLY allow syrf_pr_N databases
if [[ ! "$target" =~ ^syrf_pr_[0-9]+$ ]]; then
echo "FATAL: Cannot target database: $target"
exit 1
fi
# Explicit blocklist
case "$target" in
syrftest|syrfdev|syrf_snapshot|syrf_staging|admin|local|config)
echo "FATAL: Protected database: $target"
exit 1
;;
esac
}
Option C: Atlas Admin API¶
Use MongoDB Atlas API instead of database users:
# Use Atlas API to drop database
curl -X DELETE \
"https://cloud.mongodb.com/api/atlas/v2/groups/{projectId}/clusters/{clusterName}/databases/syrf_pr_123" \
-H "Authorization: Bearer $ATLAS_API_KEY"
Pros: - Doesn't require a powerful MongoDB user - Uses the same API key that Atlas Operator already has
Cons: - More complex implementation - Different authentication mechanism
Option D: Accept Orphans + Manual Cleanup¶
Don't automatically drop databases. Let them accumulate and clean up periodically.
# Manual cleanup script (run occasionally by admin)
mongosh "$ADMIN_URI" --eval "
db.adminCommand('listDatabases').databases
.filter(d => d.name.startsWith('syrf_pr_'))
.forEach(d => {
print('Dropping: ' + d.name);
db.getSiblingDB(d.name).dropDatabase();
});
"
Pros: - Simplest implementation - No permission issues
Cons: - Databases accumulate (storage cost) - Requires manual intervention
Recommended Approach¶
Primary: Option A (PreDelete hook with PR user) Fallback: Option D (Manual cleanup for orphans)
The PreDelete hook handles the normal case. For edge cases where hooks fail, periodic manual cleanup (weekly/monthly) handles orphans.
Why NOT Option B?¶
Having a user with dbAdminAnyDatabase is dangerous. Even with script validation, a single bug or typo could drop production. The blast radius is too large.
Why NOT Option C?¶
Added complexity without significant benefit. The Atlas API approach requires different auth and more code.
Implementation Notes¶
Current State (pr-preview.yml)¶
The cleanup-tags job in the workflow currently uses staging credentials (mongo-db secret):
Problem: Staging user doesn't have access to syrf_pr_* databases.
Fix Options:
1. Use PreDelete hook (primary cleanup mechanism)
2. Accept that this fallback won't work (rely on PreDelete hook)
3. Create syrf-cleanup-admin user (increases risk surface)
Recommended Fix¶
Remove the MongoDB cleanup from cleanup-tags job. Rely on PreDelete hook for database cleanup. The workflow should only:
- Delete namespace from cluster-gitops
- Delete ArgoCD Application (if needed)
- Trust that PreDelete hook dropped the database
If databases are orphaned, clean them up manually/periodically.
Summary Table¶
| User | syrftest | syrf_snapshot | syrf_staging | syrf_pr_N |
|---|---|---|---|---|
snapshot-producer |
📖 READ | ✏️ WRITE | ❌ | ❌ |
syrf-pr-N-user |
❌ | 📖 READ | ❌ | ✏️ WRITE + 🗑️ DROP |
| Staging user | ❌ | ❌ | ✏️ WRITE | ❌ |
| Production user | ✏️ WRITE | ❌ | ❌ | ❌ |
syrf-cleanup-admin* |
⚠️ DROP | ⚠️ DROP | ⚠️ DROP | ⚠️ DROP |
*Only if Option B is implemented (not recommended)
Document End