Skip to content

Enhanced Database Seeding for Preview Environments

Overview

This document outlines a comprehensive strategy for database seeding in PR preview environments. The goal is to provide rich, realistic test data that allows developers and testers to immediately explore all features of SyRF without manual setup.

Problem Statement

Current seeding creates minimal data:

  • One "Seed Bot" owner (no real user can log in as)
  • One project with no screening decisions
  • A few studies with no annotations
  • No way for logged-in users to interact with seeded data

Result: Users must create their own projects from scratch, defeating the purpose of preview environments.

Design Principles

  1. First User Becomes Owner: The first real user to log in inherits ownership of all seed projects
  2. Valid Application States: All seed data must be achievable through normal application workflows
  3. Multi-State Coverage: Seed projects at different workflow stages (screening, annotation, complete)
  4. Realistic Data: Use real-looking study data, annotation questions, and screening decisions
  5. Idempotent: Seeding runs once, ownership transfer runs once

Architecture

Seeding Approach Decision

Decision: Use Application Service Layer seeding (domain methods directly) rather than API-driven or UI-driven seeding.

Alternatives Considered:

Approach Pros Cons
UI-Driven (Playwright) Tests complete stack including JS validation Extremely slow, fragile CSS selectors, complex setup, overkill for data creation
API-Driven (HTTP Client) Tests API contract Requires running API during seeding, auth complexity, chicken-egg problem at startup
Application Service Layer Same business logic as APIs, no HTTP overhead, runs at startup, testable Doesn't test HTTP layer

Rationale:

  1. The frontend contains view logic (form validation, category mappings), not additional business logic
  2. The backend domain is the source of truth for data integrity
  3. Seeding must run at startup before the API is fully available
  4. Using Project.UpsertCustomAnnotationQuestion() produces identical database state as the UI path

Separate E2E Testing: A Playwright test suite should verify seeded data is accessible via the UI, but this is separate from the seeding process itself.

Seed Data Hierarchy

Seed Investigators (fake reviewers)
├── Seed Bot (system owner - transfers to first real user)
├── Seed Reviewer Alpha
└── Seed Reviewer Beta

Seed Projects
├── "Quick Start Demo" (public, ready to screen)
│   ├── 10 unscreened studies
│   └── Screening stage active
├── "Screening In Progress" (public, dual screening 50% done)
│   ├── 30 studies total
│   ├── 15 studies: both reviewers screened (Include/Exclude)
│   ├── 10 studies: one reviewer screened
│   ├── 5 studies: unscreened
│   └── Shows agreement metrics
├── "Ready for Annotation" (public, screening complete)
│   ├── 20 studies (all included after screening)
│   ├── Annotation stage active
│   ├── 10 annotation questions configured
│   └── 5 studies partially annotated
├── "Complete Review" (private, fully done)
│   ├── 15 studies fully annotated
│   ├── Multiple experiments/cohorts extracted
│   └── Ready for export demonstration
└── "Private Research" (private, requires approval)
    ├── 8 studies
    ├── AutoApproveJoinRequests = false
    └── Demonstrates join request workflow

Ownership Transfer Mechanism

┌─────────────────────────────────────────────────────────────────┐
│                    Application Startup                          │
│                           │                                     │
│                    DatabaseSeeder.Execute()                     │
│                           │                                     │
│          ┌────────────────┴────────────────┐                   │
│          │                                 │                    │
│    First startup?                   Already seeded?             │
│    Seed all data                    Skip seeding                │
│          │                                 │                    │
│          └────────────────┬────────────────┘                   │
└───────────────────────────┼─────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    First Real User Logs In                      │
│                           │                                     │
│              UserHasRegisteredHandler.HandleAsync()             │
│                           │                                     │
│              SeedDataOwnershipTransfer.Execute()                │
│                           │                                     │
│          ┌────────────────┴────────────────┐                   │
│          │                                 │                    │
│    For each seed project:            Update flag:               │
│    - Transfer ownership              OwnershipTransferred=true  │
│    - Add user as Admin               (prevents re-run)          │
│    - Remove Seed Bot membership                                 │
│          │                                 │                    │
│          └────────────────┬────────────────┘                   │
└───────────────────────────┼─────────────────────────────────────┘

Component Design

1. SeedDataConstants.cs (Enhanced)

public static class SeedDataConstants
{
    // Seed Investigators
    public static readonly Guid SeedBotId = new("00000000-0000-0000-0000-000000000001");
    public static readonly Guid SeedReviewerAlphaId = new("00000000-0000-0000-0000-000000000010");
    public static readonly Guid SeedReviewerBetaId = new("00000000-0000-0000-0000-000000000011");

    // Seed Projects
    public static readonly Guid QuickStartProjectId = new("00000000-0000-0000-0000-000000000100");
    public static readonly Guid ScreeningInProgressProjectId = new("00000000-0000-0000-0000-000000000101");
    public static readonly Guid ReadyForAnnotationProjectId = new("00000000-0000-0000-0000-000000000102");
    public static readonly Guid CompleteReviewProjectId = new("00000000-0000-0000-0000-000000000103");
    public static readonly Guid PrivateResearchProjectId = new("00000000-0000-0000-0000-000000000104");

    // Ownership transfer flag (stored in a simple collection)
    public const string OwnershipTransferredKey = "SeedData:OwnershipTransferred";
}

2. DatabaseSeeder.cs (Enhanced Structure)

public class DatabaseSeeder : IRunAtInit
{
    public void Execute()
    {
        _unitOfWork.CreateMappings();
        CleanupCorruptedSeedDataIfNeeded();

        if (!IsSeedingEnabled() || DatabaseHasData()) return;

        _logger.LogInformation("Starting enhanced database seeding");

        // Phase 1: Create seed investigators
        SeedInvestigators();

        // Phase 2: Create projects with different configurations
        SeedQuickStartProject();
        SeedScreeningInProgressProject();
        SeedReadyForAnnotationProject();
        SeedCompleteReviewProject();
        SeedPrivateResearchProject();

        _logger.LogInformation("Database seeding completed");
    }
}

3. SeedDataOwnershipTransfer.cs (New)

/// <summary>
/// Transfers ownership of seed projects to the first real user.
/// Called from UserHasRegisteredHandler after creating the investigator.
/// </summary>
public class SeedDataOwnershipTransfer
{
    public async Task TransferToFirstRealUser(Guid realUserId)
    {
        if (await HasOwnershipBeenTransferred()) return;
        if (!await SeedProjectsExist()) return;

        _logger.LogInformation("Transferring seed data ownership to {UserId}", realUserId);

        var seedProjectIds = new[]
        {
            SeedDataConstants.QuickStartProjectId,
            SeedDataConstants.ScreeningInProgressProjectId,
            SeedDataConstants.ReadyForAnnotationProjectId,
            SeedDataConstants.CompleteReviewProjectId,
            SeedDataConstants.PrivateResearchProjectId
        };

        foreach (var projectId in seedProjectIds)
        {
            var project = await _unitOfWork.Projects.GetOrDefaultAsync(projectId);
            if (project == null) continue;

            // Transfer ownership
            project.TransferOwnership(realUserId);

            // Add as admin if not already member
            if (!project.IsMember(realUserId))
            {
                project.AddDirectMembership(realUserId, isAdmin: true);
            }

            await _unitOfWork.SaveAsync(project);
        }

        await MarkOwnershipTransferred();
    }
}

Seed Data Specifications

Seed Investigators

ID Name Email Purpose
...0001 SyRF Seed Bot seedbot@syrf.org.uk Project owner (transfers)
...0010 Dr. Alpha Reviewer alpha@syrf-seed.local Screening decisions
...0011 Dr. Beta Reviewer beta@syrf-seed.local Screening decisions

Project Configurations

Quick Start Demo

  • Visibility: Public
  • Auto-approve: Yes
  • Agreement Mode: Single screening
  • Stages: Screening (active), Annotation (inactive)
  • Studies: 10 unscreened
  • Purpose: Immediate hands-on screening

Screening In Progress

  • Visibility: Public
  • Auto-approve: Yes
  • Agreement Mode: Automated dual screening (33% threshold)
  • Stages: Screening (active)
  • Studies: 30 total
  • 15 fully screened (both reviewers)
  • 10 partially screened (one reviewer)
  • 5 unscreened
  • Decisions:
  • 10 included (agreement)
  • 3 excluded (agreement)
  • 2 disagreement (needs reconciliation)
  • Purpose: Show screening progress, agreement metrics

Ready for Annotation

  • Visibility: Public
  • Auto-approve: Yes
  • Agreement Mode: Completed
  • Stages: Annotation (active)
  • Studies: 20 (all passed screening)
  • Annotation Questions: 10 configured
  • 3 study-level questions
  • 4 experiment-level questions
  • 3 outcome questions
  • Annotations: 5 studies partially annotated
  • Purpose: Data extraction workflow

Complete Review

  • Visibility: Private
  • Auto-approve: Yes
  • Agreement Mode: Completed
  • Stages: All complete
  • Studies: 15 fully annotated
  • Annotations: Complete for all studies
  • Purpose: Export demonstration, completed project view

Private Research

  • Visibility: Private
  • Auto-approve: No
  • Agreement Mode: Single screening
  • Studies: 8
  • Purpose: Join request workflow demonstration

Annotation Questions

[
  {
    "id": "...0200",
    "question": "What species was used?",
    "type": "dropdown",
    "category": "Study",
    "options": ["Mouse", "Rat", "Other rodent", "Non-rodent"]
  },
  {
    "id": "...0201",
    "question": "Was randomization performed?",
    "type": "boolean",
    "category": "RiskOfBias"
  },
  {
    "id": "...0202",
    "question": "Sample size per group",
    "type": "integer",
    "category": "Experiment"
  },
  {
    "id": "...0203",
    "question": "Intervention description",
    "type": "textbox",
    "category": "Treatment"
  },
  {
    "id": "...0204",
    "question": "Primary outcome measure",
    "type": "string",
    "category": "OutcomeAssessment"
  }
]

Sample Studies (Enhanced)

Studies should have realistic titles/abstracts from preclinical research. Use embedded JSON resource with 50+ studies covering:

  • Different disease models (stroke, Parkinson's, Alzheimer's, spinal cord injury)
  • Various interventions (pharmacological, cell therapy, exercise)
  • Multiple outcome types (behavioral, histological, molecular)
  • Range of publication years (2015-2024)

Implementation Plan

Phase 1: Infrastructure (Low Risk)

  1. Add new GUIDs to SeedDataConstants.cs
  2. Create SeedDataOwnershipTransfer.cs class
  3. Add TransferOwnership and AddDirectMembership methods to Project
  4. Integrate with UserHasRegisteredHandler

Phase 2: Multi-Project Seeding (Medium Risk)

  1. Expand DatabaseSeeder with project creation methods
  2. Create seed investigators (Alpha, Beta reviewers)
  3. Add project memberships for seed reviewers
  4. Configure stages and agreement modes

Phase 3: Screening Decisions (Medium Risk)

  1. Create screening decisions for Screening In Progress project
  2. Calculate and store agreement metrics
  3. Mark studies with appropriate ScreeningInfo states

Phase 4: Annotation Data (Higher Complexity)

  1. Add annotation questions to projects
  2. Create annotation sessions
  3. Populate extraction data for Complete Review project

Phase 5: Extended Sample Data (Content)

  1. Expand SampleStudies.json to 50+ studies
  2. Ensure variety in study characteristics
  3. Add realistic abstracts

Testing Strategy

  1. Unit Tests: Test ownership transfer logic in isolation
  2. Integration Tests: Verify seeding creates valid database state
  3. E2E Tests: Confirm first user can access seeded projects
  4. Manual Testing: Verify all workflow states are reachable

Rollback Plan

If seeding causes issues:

  1. Set SYRF_SEED_DATA_ENABLED=false in environment
  2. Delete seed data using cleanup methods
  3. Revert to previous seeder version

Success Criteria

  1. First user immediately sees 5 projects in their dashboard
  2. User can screen studies in "Quick Start Demo"
  3. User can view progress in "Screening In Progress"
  4. User can annotate studies in "Ready for Annotation"
  5. User can export data from "Complete Review"
  6. User must request access to "Private Research"

Appendix: Research References

Best practices consulted: