MongoDB Index Cleanup — Database Commands¶

This document provides the mongosh commands to drop unused indexes from existing databases. These commands must be run after the application code changes have been deployed (to prevent the application from recreating the indexes on next startup).

Important Notes¶

All commands use dropIndex() which is safe — it only removes the index, not the data.
On Atlas, index drops are generally online (they do not block normal read/write operations), but dropping large indexes can take noticeable time and may affect query performance. Prefer to run these during low-traffic periods and monitor the cluster.
If an index needs to be recreated, use createIndex() — index builds typically run in the background on Atlas but can still impact performance while they are in progress.
Run these against each database that has the indexes: syrftest (production), syrf_staging, and any syrf_pr_* databases.
Connect with an admin user that has dbAdmin or dbAdminAnyDatabase role.

Production Database (`syrftest`)¶

Connect to the production cluster:

mongosh "mongodb+srv://cluster0.siwfo.mongodb.net/syrftest" --username <admin_user>

pmStudy — Drop manually-added / legacy indexes¶

These indexes are NOT created by application code and will not be recreated:

use syrftest;

// Legacy AnnotationInfo indexes (old schema name, now ExtractionInfo)
db.pmStudy.dropIndex("AnnotationInfo.Annotations.Id_1");
db.pmStudy.dropIndex("AnnotationInfo.Sessions.Id_1");
db.pmStudy.dropIndex("AnnotationInfo.Annotations.ParentId_1");
db.pmStudy.dropIndex("AnnotationInfo.Annotations.AnnotatorId_1_ProjectId_1");

// Manually-created indexes that are never used
db.pmStudy.dropIndex("idx_proj_annProjId");
db.pmStudy.dropIndex("idx_proj_scrnProjId");

// Redundant single-field indexes (covered by compound indexes)
db.pmStudy.dropIndex("ScreeningInfo.AgreementMeasure.NumberScreened_1");
db.pmStudy.dropIndex("ScreeningInfo.AgreementMeasure.AbsoluteAgreementRatio_1");
db.pmStudy.dropIndex("SystematicSearchId_1");

pmStudy — Drop indexes that are also in application code¶

Run these AFTER deploying the code changes that remove them from StudyRepository.InitialiseIndexesAsync():

// Unused compound indexes created by application code
db.pmStudy.dropIndex("ProjectId_1_SystematicSearchId_1_ReferenceFileId_1");
db.pmStudy.dropIndex("ScreeningInfo.Inclusion_1");
db.pmStudy.dropIndex("ExtractionInfo.Annotations.ProjectId_1_ExtractionInfo.Annotations.QuestionId_1");
db.pmStudy.dropIndex("ProjectId_1_ExtractionInfo.Annotations.QuestionId_1");
db.pmStudy.dropIndex("ExtractionInfo.Annotations.QuestionId_1");

// NOTE: The pmStudy text index is intentionally NOT dropped here.
// It is still created by StudyRepository.InitialiseIndexesAsync() and used by
// StudyController.GetTableData() via the $text query in CreateMongoFindQuery().
// Drop it only after the text search code is removed or replaced.

pmProject — Drop all custom indexes¶

db.pmProject.dropIndex("Keywords_text_Name_text_IsPublic_1");
db.pmProject.dropIndex({ "Stages.Id": 1 });
db.pmProject.dropIndex("Memberships.InvestigatorId_1");
db.pmProject.dropIndex("Registrations.InvestigatorId_1");
db.pmProject.dropIndex("SystematicSearchIds_1");

pmInvestigator — Drop legacy indexes¶

db.pmInvestigator.dropIndex("ChangeEmailVerify.Code_1");
db.pmInvestigator.dropIndex("Email_1");
db.pmInvestigator.dropIndex("ChangeEmailVerify.VerificationCodes.Code_1");

pmRiskOfBiasAiJob — Drop unused indexes¶

db.pmRiskOfBiasAiJob.dropIndex("StudyId_1");
db.pmRiskOfBiasAiJob.dropIndex("ProjectId_1_StudyId_1");

Preview Cluster Databases¶

Connect to the Preview cluster and repeat for each PR database:

mongosh "mongodb+srv://preview.siwfo.mongodb.net/" --username <admin_user>

Script: Drop indexes from all PR databases¶

// List all PR databases and drop unused indexes from each
const prDbs = db.adminCommand({ listDatabases: 1 }).databases
  .map(d => d.name)
  .filter(name => name.startsWith("syrf_pr_") || name === "syrf_staging");

prDbs.forEach(dbName => {
  print(`\n=== Processing ${dbName} ===`);
  const targetDb = db.getSiblingDB(dbName);

  // Helper: safely get index names for a collection (returns [] if collection doesn't exist)
  function safeGetIndexNames(collection) {
    try {
      return collection.getIndexes().map(i => i.name);
    } catch (e) {
      print(`  Skipping ${collection.getName()} — collection does not exist`);
      return [];
    }
  }

  // pmStudy indexes (only drop ones that exist — some PRs may not have all)
  // NOTE: text index intentionally excluded — still used by application code
  const studyIndexNames = [
    "AnnotationInfo.Annotations.Id_1",
    "AnnotationInfo.Sessions.Id_1",
    "AnnotationInfo.Annotations.ParentId_1",
    "AnnotationInfo.Annotations.AnnotatorId_1_ProjectId_1",
    "idx_proj_annProjId",
    "idx_proj_scrnProjId",
    "ScreeningInfo.AgreementMeasure.NumberScreened_1",
    "ScreeningInfo.AgreementMeasure.AbsoluteAgreementRatio_1",
    "SystematicSearchId_1",
    "ProjectId_1_SystematicSearchId_1_ReferenceFileId_1",
    "ScreeningInfo.Inclusion_1",
    "ExtractionInfo.Annotations.ProjectId_1_ExtractionInfo.Annotations.QuestionId_1",
    "ProjectId_1_ExtractionInfo.Annotations.QuestionId_1",
    "ExtractionInfo.Annotations.QuestionId_1"
  ];

  const existingStudyIndexes = safeGetIndexNames(targetDb.pmStudy);
  studyIndexNames.forEach(name => {
    if (existingStudyIndexes.includes(name)) {
      targetDb.pmStudy.dropIndex(name);
      print(`  Dropped pmStudy.${name}`);
    }
  });

  // pmProject indexes
  const projectIndexNames = [
    "Keywords_text_Name_text_IsPublic_1",
    "Memberships.InvestigatorId_1",
    "Registrations.InvestigatorId_1",
    "SystematicSearchIds_1"
  ];

  const existingProjectIndexes = safeGetIndexNames(targetDb.pmProject);
  projectIndexNames.forEach(name => {
    if (existingProjectIndexes.includes(name)) {
      targetDb.pmProject.dropIndex(name);
      print(`  Dropped pmProject.${name}`);
    }
  });

  // pmProject — drop Stages.Id index by key spec (name varies across environments)
  if (existingProjectIndexes.some(n => n.includes("Stages"))) {
    try {
      targetDb.pmProject.dropIndex({ "Stages.Id": 1 });
      print(`  Dropped pmProject Stages.Id index`);
    } catch (e) {
      print(`  Stages.Id index not found or already dropped`);
    }
  }

  // pmInvestigator legacy indexes
  const investIndexNames = [
    "ChangeEmailVerify.Code_1",
    "Email_1",
    "ChangeEmailVerify.VerificationCodes.Code_1"
  ];

  const existingInvestIndexes = safeGetIndexNames(targetDb.pmInvestigator);
  investIndexNames.forEach(name => {
    if (existingInvestIndexes.includes(name)) {
      targetDb.pmInvestigator.dropIndex(name);
      print(`  Dropped pmInvestigator.${name}`);
    }
  });

  // pmRiskOfBiasAiJob indexes
  const robIndexNames = ["StudyId_1", "ProjectId_1_StudyId_1"];
  const existingRobIndexes = safeGetIndexNames(targetDb.pmRiskOfBiasAiJob);
  robIndexNames.forEach(name => {
    if (existingRobIndexes.includes(name)) {
      targetDb.pmRiskOfBiasAiJob.dropIndex(name);
      print(`  Dropped pmRiskOfBiasAiJob.${name}`);
    }
  });
});

Stale PR Databases on Production Cluster¶

The following PR databases exist on the production cluster and should be dropped entirely (they are leftovers from before PR environments were moved to the Preview cluster):

// Connect to production cluster
use admin;

// Verify these are genuinely stale (check PR status first)
db.getSiblingDB("syrf_pr_2224").dropDatabase();
db.getSiblingDB("syrf_pr_2234").dropDatabase();
db.getSiblingDB("syrf_pr_2245").dropDatabase();
db.getSiblingDB("syrf_pr_2246").dropDatabase();
db.getSiblingDB("syrf_pr_2247").dropDatabase();

Verification¶

After dropping indexes, verify the results:

// Check remaining indexes on pmStudy
db.pmStudy.getIndexes().forEach(idx => {
  print(`${idx.name}: ${JSON.stringify(idx.key)}`);
});

// Check total index size
db.pmStudy.stats().indexSizes;

// Check overall database size
db.stats();

Rollback¶

If any removed index is needed, recreate it:

// Example: recreate the text index if needed
db.pmStudy.createIndex(
  { Abstract: "text", "PublicationName.AlternateName": "text",
    "PublicationName.Name": "text", Title: "text", Keywords: "text" },
  { background: true }
);

Index builds on Atlas M20 with 3.1M documents typically complete in 2-5 minutes and run in the background without blocking queries.