Runbook: Renaming `syrftest` to `syrf-prod`¶

Status: DEFERRED - This runbook is preserved for future use. The current decision is to keep syrftest as the production database name to avoid migration complexity. Execute this runbook when/if the team decides to rename the production database.

Objective: Rename the production logical database from syrftest to syrf-prod with minimal downtime. Primary Constraint: The pmStudy collection contains 8GB of indexes vs 4GB RAM on the current M20 tier. Rebuilding these indexes without scaling up will cause severe latency. Estimated Maintenance Window: 30–45 minutes (including buffer).

✅ Phase 1: Pre-Flight Checklist¶

Perform these steps 24 hours to 1 hour before the maintenance window.

Provision "Jumpbox" VM:

Create a small Ubuntu VM (e.g., AWS t3.medium) in AWS Ireland (eu-west-1).
Why: Keeps traffic inside the AWS internal network for maximum speed.

Install Tools on VM:

wget https://fastdl.mongodb.org/tools/db/mongodb-database-tools-ubuntu2204-x86_64-100.9.4.tgz
tar -zxvf mongodb-database-tools-*.tgz
sudo cp mongodb-database-tools-*/bin/* /usr/local/bin/

Verify Access:
- Allow the VM's private IP in the MongoDB Atlas Network Access whitelist.
- Test connection from VM: mongosh "mongodb+srv://<CLUSTER_HOST>" --username admin
Prepare Application:
- Prepare the new environment variable string: .../syrf-prod?retryWrites=true...
- Ensure you have a way to stop/start the app services quickly.

🚀 Phase 2: Execution (Maintenance Window)¶

Step 1: Scale Up Cluster (Critical)¶

Time required: ~5-10 mins (Rolling update)

Log in to MongoDB Atlas.
Navigate to Database > Cluster0 > Configuration.
Change Tier from M20 to M40 (16GB RAM).
- Reasoning: You have 8GB of indexes. M20 (4GB RAM) will crash/thrash during index rebuilds. M40 ensures indexes fit in RAM for a fast rebuild.
Wait for the cluster status to return to strictly Green/Active.

Step 2: Stop Traffic¶

Stop your application containers/services.
- Ensure no writes are hitting syrftest to prevent data loss.

Step 3: Run the Migration Pipeline¶

Run this from the Jumpbox VM, not your local machine.

# 1. Export Connection Variables
# Replace with your actual credentials
export ATLAS_URI="mongodb+srv://admin:YOUR_PASSWORD_HERE@cluster0.abcde.mongodb.net"
export OLD_DB="syrftest"
export NEW_DB="syrf-prod"

# 2. Run the Streaming Migration
# This pipes dump directly to restore without saving to disk.
# --numParallelCollections=4: Optimises throughput
# --stopOnError: Halts immediately if a critical issue occurs

echo "Starting migration from $OLD_DB to $NEW_DB..."

mongodump --uri="$ATLAS_URI/$OLD_DB" --archive | \
mongorestore --uri="$ATLAS_URI/$NEW_DB" --archive \
  --nsFrom="$OLD_DB.*" \
  --nsTo="$NEW_DB.*" \
  --numParallelCollections=4 \
  --stopOnError

What to expect:
You will see a flurry of activity as documents are copied.
The Pause: The terminal will pause at the end. Do not panic. This is the M40 cluster rebuilding the 31 indexes for pmStudy. It may take 5–15 minutes.

🔍 Phase 3: Verification¶

Run these commands in mongosh or check via the Atlas UI Data Explorer.

1. Check Document Counts:

use syrf-prod
// Should match ~3.1M
db.pmStudy.countDocuments()
// Should match ~4.6k
db.pmInvestigator.countDocuments()

2. Verify Views: Check that the view pmInvestigatorEmail exists and returns data (views are sometimes skipped if dependencies aren't found, though mongorestore usually handles them at the end).

db.pmInvestigatorEmail.findOne()

3. Verify Indexes: Ensure pmStudy has 31 indexes.

db.pmStudy.getIndexes().length
// Expected Output: 31

🟢 Phase 4: Cutover & Cleanup¶

Update Application Config:
- Change MONGODB_URI (or equivalent) in your .env or secrets manager to point to syrf-prod.
Start Application:
- Boot up your services.
- Tail logs to ensure successful connection and no Unauthorized or NamespaceNotFound errors.
Scale Down:
- Once the app is stable (e.g., after 1 hour of monitoring), go back to Atlas.
- Scale Cluster0 back down to M20.
- Note: Using M40 for \<1 hour will cost pennies.
Drop Old DB (Optional but Recommended later):
- Keep syrftest for 24-48 hours as a backup.
- Delete it via Atlas UI when confident: db.dropDatabase()

⚠️ Rollback Plan¶

If the migration fails or takes too long.

Abort Command: Ctrl+C the migration script on the VM.
Revert App Config: Ensure app config still points to syrftest.
Restart App: Bring services back online pointing to the old DB.
Cleanup: Drop the partially created syrf-prod database to avoid confusion.
Scale Down: Return cluster to M20.

Runbook: Renaming syrftest to syrf-prod¶