Skip to content

MongoDB Permissions for Data Snapshot Automation

This document explains how MongoDB permissions work for the snapshot/restore feature, clarifying common misconceptions and documenting the actual permission model.


Critical Misconception: No Wildcard Database Permissions

⚠️ MongoDB does NOT support wildcard patterns for database names in role grants.

You cannot do this:

// ❌ THIS IS NOT POSSIBLE
db.grantRolesToUser("cleanup-user", [
  { role: "dbAdmin", db: "syrf_pr_*" }  // INVALID - wildcards don't work
])

Each role must be granted on a specific, named database or using cluster-wide built-in roles.


MongoDB Role Scoping Options

Option 1: Specific Database Roles

Roles are granted per database. Each database must be explicitly named:

// ✅ Valid - explicit database names
db.grantRolesToUser("pr-123-user", [
  { role: "readWrite", db: "syrf_pr_123" },  // Specific to PR 123
  { role: "dbAdmin", db: "syrf_pr_123" },    // Can drop its own DB
  { role: "read", db: "syrf_snapshot" }      // Read from snapshot
])

Implication: You must create/update permissions for each new PR database. This is what the Atlas Operator does automatically.

Option 2: Cluster-Wide Built-in Roles

MongoDB has cluster-wide admin roles that work on ALL databases:

Role Access Level Databases Affected
readAnyDatabase Read ALL databases
readWriteAnyDatabase Read/Write ALL databases
dbAdminAnyDatabase Admin (drop, create, etc.) ALL databases
userAdminAnyDatabase User management ALL databases
// ✅ Valid - cluster-wide role
db.grantRolesToUser("admin-user", [
  { role: "dbAdminAnyDatabase", db: "admin" }  // Can drop ANY database
])

⚠️ WARNING: These roles provide access to ALL databases including production! Use with extreme caution.


Users in the Data Snapshot System

1. Snapshot Producer User

Purpose: Weekly CronJob that copies syrftestsyrf_snapshot

Database Role Purpose
syrftest read Read production data (read-only)
syrf_snapshot readWrite Write snapshot data
syrf_snapshot dbAdmin Drop collections before refresh
// Created manually in Atlas Console
{
  "user": "snapshot-producer",
  "roles": [
    { "role": "read", "db": "syrftest" },
    { "role": "readWrite", "db": "syrf_snapshot" },
    { "role": "dbAdmin", "db": "syrf_snapshot" }
  ]
}

Security: Cannot write to production. Even a bug cannot corrupt syrftest.

2. PR-Specific Users (via Atlas Operator)

Purpose: Each PR gets an isolated user with access only to its database

Database Role Purpose
syrf_pr_N readWrite Application read/write
syrf_pr_N dbAdmin Can drop its own database
syrf_snapshot read Read from snapshot for restore
# Created automatically by Atlas Operator for each PR
apiVersion: atlas.mongodb.com/v1
kind: AtlasDatabaseUser
spec:
  username: syrf-pr-123-user
  roles:
    - roleName: readWrite
      databaseName: syrf_pr_123
    - roleName: dbAdmin
      databaseName: syrf_pr_123
    - roleName: read
      databaseName: syrf_snapshot  # When use-snapshot label present

Security: Each PR user is completely isolated. Cannot access other PR databases or production.

3. Staging User (Existing)

Purpose: Application access for staging environment

Database Role Purpose
syrf_staging readWrite Application access
syrf_staging dbAdmin Schema management

⚠️ Problem: Staging user does NOT have access to syrf_pr_* databases. It cannot drop PR databases.


The Cleanup Problem

When a PR is closed, who drops the syrf_pr_N database?

Challenge

  1. The PR user (syrf-pr-N-user) has dbAdmin on its own database and CAN drop it
  2. BUT the user is managed by Atlas Operator via a CRD in the PR namespace
  3. When the PR closes, the namespace (and CRD) gets deleted
  4. Atlas Operator then deletes the MongoDB user
  5. Race condition: Can we drop the database before the user is deleted?

Option A: PreDelete Hook with PR User (Current Approach)

The PR user drops its own database before ArgoCD deletes the namespace:

# ArgoCD PreDelete hook - runs BEFORE namespace deletion
apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    argocd.argoproj.io/hook: PreDelete
spec:
  template:
    spec:
      containers:
        - name: cleanup
          image: mongo:7.0
          command: ["/bin/bash", "-c"]
          args:
            - |
              mongosh "$MONGODB_URI" --eval "
                db.getSiblingDB('syrf_pr_${PR_NUM}').dropDatabase();
              "
          env:
            - name: MONGODB_URI
              valueFrom:
                secretKeyRef:
                  name: mongodb-credentials  # PR user's credentials
                  key: connectionString

Why it works: 1. PreDelete hook runs FIRST (before any resource deletion) 2. PR user credentials still exist at this point 3. PR user has dbAdmin on syrf_pr_N (its own DB only) 4. Database is dropped 5. THEN ArgoCD deletes the namespace 6. THEN Atlas Operator cleans up the MongoDB user

Pros: - Least privilege - each user only drops its own database - No cluster-wide admin access needed

Cons: - If hook fails, database is orphaned - Requires ArgoCD to respect hook ordering

Option B: Dedicated Cleanup User with dbAdminAnyDatabase

Create a powerful service account for cleanup operations:

{
  "user": "syrf-cleanup-admin",
  "roles": [
    { "role": "dbAdminAnyDatabase", "db": "admin" }
  ]
}

Pros: - Can clean up any orphan database - Works as fallback if PreDelete hooks fail

Cons: - EXTREME RISK: Can drop production (syrftest) or any other database! - Requires defense-in-depth: script validation, allowlists, etc.

Mandatory Safety Script:

# MUST validate before ANY operation
validate_target_db() {
  local target="$1"

  # ONLY allow syrf_pr_N databases
  if [[ ! "$target" =~ ^syrf_pr_[0-9]+$ ]]; then
    echo "FATAL: Cannot target database: $target"
    exit 1
  fi

  # Explicit blocklist
  case "$target" in
    syrftest|syrfdev|syrf_snapshot|syrf_staging|admin|local|config)
      echo "FATAL: Protected database: $target"
      exit 1
      ;;
  esac
}

Option C: Atlas Admin API

Use MongoDB Atlas API instead of database users:

# Use Atlas API to drop database
curl -X DELETE \
  "https://cloud.mongodb.com/api/atlas/v2/groups/{projectId}/clusters/{clusterName}/databases/syrf_pr_123" \
  -H "Authorization: Bearer $ATLAS_API_KEY"

Pros: - Doesn't require a powerful MongoDB user - Uses the same API key that Atlas Operator already has

Cons: - More complex implementation - Different authentication mechanism

Option D: Accept Orphans + Manual Cleanup

Don't automatically drop databases. Let them accumulate and clean up periodically.

# Manual cleanup script (run occasionally by admin)
mongosh "$ADMIN_URI" --eval "
  db.adminCommand('listDatabases').databases
    .filter(d => d.name.startsWith('syrf_pr_'))
    .forEach(d => {
      print('Dropping: ' + d.name);
      db.getSiblingDB(d.name).dropDatabase();
    });
"

Pros: - Simplest implementation - No permission issues

Cons: - Databases accumulate (storage cost) - Requires manual intervention


Primary: Option A (PreDelete hook with PR user) Fallback: Option D (Manual cleanup for orphans)

The PreDelete hook handles the normal case. For edge cases where hooks fail, periodic manual cleanup (weekly/monthly) handles orphans.

Why NOT Option B?

Having a user with dbAdminAnyDatabase is dangerous. Even with script validation, a single bug or typo could drop production. The blast radius is too large.

Why NOT Option C?

Added complexity without significant benefit. The Atlas API approach requires different auth and more code.


Implementation Notes

Current State (pr-preview.yml)

The cleanup-tags job in the workflow currently uses staging credentials (mongo-db secret):

- name: Cleanup MongoDB database
  env:
    MONGO_URI: ${{ secrets.MONGO_DB_URI }}  # Staging credentials

Problem: Staging user doesn't have access to syrf_pr_* databases.

Fix Options: 1. Use PreDelete hook (primary cleanup mechanism) 2. Accept that this fallback won't work (rely on PreDelete hook) 3. Create syrf-cleanup-admin user (increases risk surface)

Remove the MongoDB cleanup from cleanup-tags job. Rely on PreDelete hook for database cleanup. The workflow should only: - Delete namespace from cluster-gitops - Delete ArgoCD Application (if needed) - Trust that PreDelete hook dropped the database

If databases are orphaned, clean them up manually/periodically.


Summary Table

User syrftest syrf_snapshot syrf_staging syrf_pr_N
snapshot-producer 📖 READ ✏️ WRITE
syrf-pr-N-user 📖 READ ✏️ WRITE + 🗑️ DROP
Staging user ✏️ WRITE
Production user ✏️ WRITE
syrf-cleanup-admin* ⚠️ DROP ⚠️ DROP ⚠️ DROP ⚠️ DROP

*Only if Option B is implemented (not recommended)


Document End