Enhanced Database Seeding for Preview Environments¶
Overview¶
This document outlines a comprehensive strategy for database seeding in PR preview environments. The goal is to provide rich, realistic test data that allows developers and testers to immediately explore all features of SyRF without manual setup.
Problem Statement¶
Current seeding creates minimal data:
- One "Seed Bot" owner (no real user can log in as)
- One project with no screening decisions
- A few studies with no annotations
- No way for logged-in users to interact with seeded data
Result: Users must create their own projects from scratch, defeating the purpose of preview environments.
Design Principles¶
- First User Becomes Owner: The first real user to log in inherits ownership of all seed projects
- Valid Application States: All seed data must be achievable through normal application workflows
- Multi-State Coverage: Seed projects at different workflow stages (screening, annotation, complete)
- Realistic Data: Use real-looking study data, annotation questions, and screening decisions
- Idempotent: Seeding runs once, ownership transfer runs once
Architecture¶
Seeding Approach Decision¶
Decision: Use Application Service Layer seeding (domain methods directly) rather than API-driven or UI-driven seeding.
Alternatives Considered:
| Approach | Pros | Cons |
|---|---|---|
| UI-Driven (Playwright) | Tests complete stack including JS validation | Extremely slow, fragile CSS selectors, complex setup, overkill for data creation |
| API-Driven (HTTP Client) | Tests API contract | Requires running API during seeding, auth complexity, chicken-egg problem at startup |
| Application Service Layer ✓ | Same business logic as APIs, no HTTP overhead, runs at startup, testable | Doesn't test HTTP layer |
Rationale:
- The frontend contains view logic (form validation, category mappings), not additional business logic
- The backend domain is the source of truth for data integrity
- Seeding must run at startup before the API is fully available
- Using
Project.UpsertCustomAnnotationQuestion()produces identical database state as the UI path
Separate E2E Testing: A Playwright test suite should verify seeded data is accessible via the UI, but this is separate from the seeding process itself.
Seed Data Hierarchy¶
Seed Investigators (fake reviewers)
├── Seed Bot (system owner - transfers to first real user)
├── Seed Reviewer Alpha
└── Seed Reviewer Beta
Seed Projects
├── "Quick Start Demo" (public, ready to screen)
│ ├── 10 unscreened studies
│ └── Screening stage active
│
├── "Screening In Progress" (public, dual screening 50% done)
│ ├── 30 studies total
│ ├── 15 studies: both reviewers screened (Include/Exclude)
│ ├── 10 studies: one reviewer screened
│ ├── 5 studies: unscreened
│ └── Shows agreement metrics
│
├── "Ready for Annotation" (public, screening complete)
│ ├── 20 studies (all included after screening)
│ ├── Annotation stage active
│ ├── 10 annotation questions configured
│ └── 5 studies partially annotated
│
├── "Complete Review" (private, fully done)
│ ├── 15 studies fully annotated
│ ├── Multiple experiments/cohorts extracted
│ └── Ready for export demonstration
│
└── "Private Research" (private, requires approval)
├── 8 studies
├── AutoApproveJoinRequests = false
└── Demonstrates join request workflow
Ownership Transfer Mechanism¶
┌─────────────────────────────────────────────────────────────────┐
│ Application Startup │
│ │ │
│ DatabaseSeeder.Execute() │
│ │ │
│ ┌────────────────┴────────────────┐ │
│ │ │ │
│ First startup? Already seeded? │
│ Seed all data Skip seeding │
│ │ │ │
│ └────────────────┬────────────────┘ │
└───────────────────────────┼─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ First Real User Logs In │
│ │ │
│ UserHasRegisteredHandler.HandleAsync() │
│ │ │
│ SeedDataOwnershipTransfer.Execute() │
│ │ │
│ ┌────────────────┴────────────────┐ │
│ │ │ │
│ For each seed project: Update flag: │
│ - Transfer ownership OwnershipTransferred=true │
│ - Add user as Admin (prevents re-run) │
│ - Remove Seed Bot membership │
│ │ │ │
│ └────────────────┬────────────────┘ │
└───────────────────────────┼─────────────────────────────────────┘
Component Design¶
1. SeedDataConstants.cs (Enhanced)¶
public static class SeedDataConstants
{
// Seed Investigators
public static readonly Guid SeedBotId = new("00000000-0000-0000-0000-000000000001");
public static readonly Guid SeedReviewerAlphaId = new("00000000-0000-0000-0000-000000000010");
public static readonly Guid SeedReviewerBetaId = new("00000000-0000-0000-0000-000000000011");
// Seed Projects
public static readonly Guid QuickStartProjectId = new("00000000-0000-0000-0000-000000000100");
public static readonly Guid ScreeningInProgressProjectId = new("00000000-0000-0000-0000-000000000101");
public static readonly Guid ReadyForAnnotationProjectId = new("00000000-0000-0000-0000-000000000102");
public static readonly Guid CompleteReviewProjectId = new("00000000-0000-0000-0000-000000000103");
public static readonly Guid PrivateResearchProjectId = new("00000000-0000-0000-0000-000000000104");
// Ownership transfer flag (stored in a simple collection)
public const string OwnershipTransferredKey = "SeedData:OwnershipTransferred";
}
2. DatabaseSeeder.cs (Enhanced Structure)¶
public class DatabaseSeeder : IRunAtInit
{
public void Execute()
{
_unitOfWork.CreateMappings();
CleanupCorruptedSeedDataIfNeeded();
if (!IsSeedingEnabled() || DatabaseHasData()) return;
_logger.LogInformation("Starting enhanced database seeding");
// Phase 1: Create seed investigators
SeedInvestigators();
// Phase 2: Create projects with different configurations
SeedQuickStartProject();
SeedScreeningInProgressProject();
SeedReadyForAnnotationProject();
SeedCompleteReviewProject();
SeedPrivateResearchProject();
_logger.LogInformation("Database seeding completed");
}
}
3. SeedDataOwnershipTransfer.cs (New)¶
/// <summary>
/// Transfers ownership of seed projects to the first real user.
/// Called from UserHasRegisteredHandler after creating the investigator.
/// </summary>
public class SeedDataOwnershipTransfer
{
public async Task TransferToFirstRealUser(Guid realUserId)
{
if (await HasOwnershipBeenTransferred()) return;
if (!await SeedProjectsExist()) return;
_logger.LogInformation("Transferring seed data ownership to {UserId}", realUserId);
var seedProjectIds = new[]
{
SeedDataConstants.QuickStartProjectId,
SeedDataConstants.ScreeningInProgressProjectId,
SeedDataConstants.ReadyForAnnotationProjectId,
SeedDataConstants.CompleteReviewProjectId,
SeedDataConstants.PrivateResearchProjectId
};
foreach (var projectId in seedProjectIds)
{
var project = await _unitOfWork.Projects.GetOrDefaultAsync(projectId);
if (project == null) continue;
// Transfer ownership
project.TransferOwnership(realUserId);
// Add as admin if not already member
if (!project.IsMember(realUserId))
{
project.AddDirectMembership(realUserId, isAdmin: true);
}
await _unitOfWork.SaveAsync(project);
}
await MarkOwnershipTransferred();
}
}
Seed Data Specifications¶
Seed Investigators¶
| ID | Name | Purpose | |
|---|---|---|---|
...0001 |
SyRF Seed Bot | seedbot@syrf.org.uk | Project owner (transfers) |
...0010 |
Dr. Alpha Reviewer | alpha@syrf-seed.local | Screening decisions |
...0011 |
Dr. Beta Reviewer | beta@syrf-seed.local | Screening decisions |
Project Configurations¶
Quick Start Demo¶
- Visibility: Public
- Auto-approve: Yes
- Agreement Mode: Single screening
- Stages: Screening (active), Annotation (inactive)
- Studies: 10 unscreened
- Purpose: Immediate hands-on screening
Screening In Progress¶
- Visibility: Public
- Auto-approve: Yes
- Agreement Mode: Automated dual screening (33% threshold)
- Stages: Screening (active)
- Studies: 30 total
- 15 fully screened (both reviewers)
- 10 partially screened (one reviewer)
- 5 unscreened
- Decisions:
- 10 included (agreement)
- 3 excluded (agreement)
- 2 disagreement (needs reconciliation)
- Purpose: Show screening progress, agreement metrics
Ready for Annotation¶
- Visibility: Public
- Auto-approve: Yes
- Agreement Mode: Completed
- Stages: Annotation (active)
- Studies: 20 (all passed screening)
- Annotation Questions: 10 configured
- 3 study-level questions
- 4 experiment-level questions
- 3 outcome questions
- Annotations: 5 studies partially annotated
- Purpose: Data extraction workflow
Complete Review¶
- Visibility: Private
- Auto-approve: Yes
- Agreement Mode: Completed
- Stages: All complete
- Studies: 15 fully annotated
- Annotations: Complete for all studies
- Purpose: Export demonstration, completed project view
Private Research¶
- Visibility: Private
- Auto-approve: No
- Agreement Mode: Single screening
- Studies: 8
- Purpose: Join request workflow demonstration
Annotation Questions¶
[
{
"id": "...0200",
"question": "What species was used?",
"type": "dropdown",
"category": "Study",
"options": ["Mouse", "Rat", "Other rodent", "Non-rodent"]
},
{
"id": "...0201",
"question": "Was randomization performed?",
"type": "boolean",
"category": "RiskOfBias"
},
{
"id": "...0202",
"question": "Sample size per group",
"type": "integer",
"category": "Experiment"
},
{
"id": "...0203",
"question": "Intervention description",
"type": "textbox",
"category": "Treatment"
},
{
"id": "...0204",
"question": "Primary outcome measure",
"type": "string",
"category": "OutcomeAssessment"
}
]
Sample Studies (Enhanced)¶
Studies should have realistic titles/abstracts from preclinical research. Use embedded JSON resource with 50+ studies covering:
- Different disease models (stroke, Parkinson's, Alzheimer's, spinal cord injury)
- Various interventions (pharmacological, cell therapy, exercise)
- Multiple outcome types (behavioral, histological, molecular)
- Range of publication years (2015-2024)
Implementation Plan¶
Phase 1: Infrastructure (Low Risk)¶
- Add new GUIDs to
SeedDataConstants.cs - Create
SeedDataOwnershipTransfer.csclass - Add
TransferOwnershipandAddDirectMembershipmethods to Project - Integrate with
UserHasRegisteredHandler
Phase 2: Multi-Project Seeding (Medium Risk)¶
- Expand
DatabaseSeederwith project creation methods - Create seed investigators (Alpha, Beta reviewers)
- Add project memberships for seed reviewers
- Configure stages and agreement modes
Phase 3: Screening Decisions (Medium Risk)¶
- Create screening decisions for Screening In Progress project
- Calculate and store agreement metrics
- Mark studies with appropriate ScreeningInfo states
Phase 4: Annotation Data (Higher Complexity)¶
- Add annotation questions to projects
- Create annotation sessions
- Populate extraction data for Complete Review project
Phase 5: Extended Sample Data (Content)¶
- Expand SampleStudies.json to 50+ studies
- Ensure variety in study characteristics
- Add realistic abstracts
Testing Strategy¶
- Unit Tests: Test ownership transfer logic in isolation
- Integration Tests: Verify seeding creates valid database state
- E2E Tests: Confirm first user can access seeded projects
- Manual Testing: Verify all workflow states are reachable
Rollback Plan¶
If seeding causes issues:
- Set
SYRF_SEED_DATA_ENABLED=falsein environment - Delete seed data using cleanup methods
- Revert to previous seeder version
Success Criteria¶
- First user immediately sees 5 projects in their dashboard
- User can screen studies in "Quick Start Demo"
- User can view progress in "Screening In Progress"
- User can annotate studies in "Ready for Annotation"
- User can export data from "Complete Review"
- User must request access to "Private Research"
Related Documentation¶
Appendix: Research References¶
Best practices consulted: