Skip to content

Quartz Service Data Isolation Analysis

Executive Summary

Current State: Quartz data is fully isolated between environments. Both MassTransit saga tables and native Quartz scheduler tables now use environment-specific SQL schemas.

Implementation Status: ✅ COMPLETED (2025-12-30)

The isolation was implemented by:

  1. Adding dynamic TablePrefix in Program.cs using syrfEnvironment.RuntimeEnvironment
  2. Removing hardcoded prefix from appsettings.json
  3. Creating SQL migration script for new schemas (staging, production, preview)

See Migration Script for database setup.


Previous State (Before Fix)

Risk Level: MEDIUM-HIGH - The native Quartz scheduler tables were shared across ALL environments due to a hardcoded table prefix, creating potential for:

  • Job interference between environments
  • Debugging complexity
  • Security concerns with production/staging sharing tables
  • All PR previews competing for the same job slots

Current Architecture

Database Connection

All environments connect to the same Azure SQL Server database:

Component Value
Server syrf.database.windows.net
Database syrf-quartz
GCP Secret camarades-quartz-sql-server-credentials

Both syrf-staging and syrf-production namespaces use the same GCP secret, meaning they connect to the identical database instance.

Schema Isolation (MassTransit Saga Tables)

The MassTransit job saga tables ARE properly isolated using environment-specific schemas:

Environment Schema Name Tables
Production production production.JobSaga, production.JobTypeSaga, production.JobAttemptSaga
Staging staging staging.JobSaga, staging.JobTypeSaga, staging.JobAttemptSaga
PR Previews preview preview.JobSaga, preview.JobTypeSaga, preview.JobAttemptSaga

How it works: The SYRF__RuntimeEnvironment environment variable (set per namespace) determines the schema name at runtime via SyrfEnvironment.RuntimeEnvironment.

// Program.cs line 29
var syrfEnvironment = new SyrfEnvironment(runtimeEnvironment);

// Used in EF Core migration
m.MigrationsHistoryTable($"__{nameof(JobServiceSagaOverrideDbContext)}", syrfEnvironment.RuntimeEnvironment);

Quartz Native Tables (NOW ISOLATED ✅)

RESOLVED: The native Quartz scheduler tables are now properly isolated between environments.

The table prefix is dynamically set in Program.cs:

q.UsePersistentStore(s =>
{
    s.TablePrefix = $"[{syrfEnvironment.RuntimeEnvironment}].QRTZ_";
    // ...
});

Each environment now uses its own set of tables:

Environment Tables
Production [production].QRTZ_JOB_DETAILS, [production].QRTZ_TRIGGERS, etc.
Staging [staging].QRTZ_JOB_DETAILS, [staging].QRTZ_TRIGGERS, etc.
Preview [preview].QRTZ_JOB_DETAILS, [preview].QRTZ_TRIGGERS, etc.

What Data is Affected

Properly Isolated (Per Environment Schema)

Table Purpose Isolation
JobSaga MassTransit job lifecycle state (submitted, started, completed, faulted) Schema-based
JobTypeSaga Job type concurrency limits and active job counts Schema-based
JobAttemptSaga Individual retry attempts with status Schema-based
__JobServiceSagaOverrideDbContext EF Core migration history Schema-based

Now Isolated (Per Environment Schema) ✅

Table Purpose Status
QRTZ_JOB_DETAILS Registered job definitions ✅ Isolated per environment
QRTZ_TRIGGERS Trigger definitions for scheduled jobs ✅ Isolated per environment
QRTZ_FIRED_TRIGGERS Currently executing triggers ✅ Isolated per environment
QRTZ_CRON_TRIGGERS Cron-based schedules ✅ Isolated per environment
QRTZ_LOCKS Scheduler cluster locks ✅ Isolated per environment
QRTZ_SCHEDULER_STATE Scheduler instance tracking ✅ Isolated per environment

Previous Implications (Now Resolved)

Note: These issues have been resolved with the implementation of environment-specific table prefixes.

1. Job Interference (RESOLVED ✅)

Quartz schedulers from different environments register as nodes in the same "cluster".

Now each environment has its own set of QRTZ_* tables, so:

  • Trigger definitions are isolated per environment
  • Job group names only need to be unique within an environment
  • Scheduler state queries return data only from that environment

2. Security Concerns (RESOLVED ✅)

Staging environment has access to production job metadata.

Now each environment has complete isolation:

  • Staging cannot see production job metadata
  • PR preview environments are isolated from production
  • Job parameters in QRTZ_JOB_DETAILS are environment-specific

3. Debugging Complexity (RESOLVED ✅)

When investigating job issues, data from all environments is intermixed.

Now debugging is cleaner:

  • Each environment's scheduler state is isolated
  • Lock contention metrics are per-environment
  • Node tracking is separated by environment

4. PR Preview Impact (RESOLVED ✅)

Each PR preview environment now gets its own isolated schema:

  • Schema naming: preview_{prNumber} (e.g., preview_2247)
  • Complete isolation between PR environments
  • No job interference between PRs

How it works: The SYRF__PrNumber environment variable (set by Helm for preview deployments) is used to create per-PR schemas. If a PR number is available in a preview environment, the schema becomes preview_{prNumber} instead of just preview.

Automatic Cleanup: When a PR is closed, the cleanup-tags job in .github/workflows/pr-preview.yml automatically drops the corresponding preview_{prNumber} schema and all its QRTZ_* tables from the SQL database.


Recommendations

Effort: Low | Risk: Low | Impact: High

Add environment-specific Quartz table prefix via Helm values:

# In _env-blocks.tpl, add to quartz env block:
- name: SYRF__Quartz__quartz.jobStore.tablePrefix
  value: "[{{ .Values.environment.name }}].QRTZ_"

Pros:

  • Simple configuration change
  • No data migration required (new schema, existing remains untouched)
  • Each environment gets its own Quartz tables
  • Backward compatible

Cons:

  • Need to create new QRTZ_* tables for each environment
  • PR previews still share preview prefix (but isolated from staging/production)

Implementation Steps:

  1. Add quartz.jobStore.tablePrefix to Helm templates
  2. Create QRTZ_* tables in staging and production schemas
  3. Deploy Quartz with new config (will auto-create tables if missing)
  4. Optionally clean up old [development].QRTZ_* tables

Option 2: Separate Database Per Environment

Effort: Medium | Risk: Medium | Impact: Highest

Create environment-specific SQL databases:

  • syrf-quartz-staging
  • syrf-quartz-production
  • syrf-quartz-preview

Pros:

  • Complete isolation at database level
  • Clearer security boundaries
  • Independent backups and recovery
  • Can use different performance tiers

Cons:

  • Requires new GCP secrets per environment
  • Azure SQL billing increase (3x databases)
  • Migration effort for existing data
  • More infrastructure to manage

Option 3: PR-Specific Preview Isolation

Effort: Medium | Risk: Low | Impact: Medium

Extend schema isolation to use PR number for previews:

# For preview environments
environment:
  name: preview-{{ .prNumber }}  # e.g., preview-2246

Pros:

  • Each PR gets fully isolated job storage
  • No interference between PR previews
  • Better debugging experience

Cons:

  • Requires PR number injection into Helm values
  • More schemas to manage
  • Need cleanup job for orphaned preview schemas

Option 4: Disable Quartz in Previews

Effort: Low | Risk: Low | Impact: Low

For PR previews, disable the persistent job store:

# In preview.values.yaml
quartz:
  useInMemoryStore: true

Pros:

  • No database pollution from PRs
  • Simpler preview infrastructure
  • PRs don't need SQL Server access

Cons:

  • Jobs don't survive pod restarts in previews
  • Limited testing of scheduling functionality
  • May miss scheduling-related bugs

Priority Recommendation

Immediate (Option 1): Implement environment-specific table prefix

  • Fixes the staging/production sharing issue
  • Low risk, high value

Short-term (Option 4): Disable persistent store for previews

  • Reduces database pollution
  • Simplifies preview infrastructure

Future consideration (Option 3): PR-specific isolation

  • Only if debugging across PRs becomes problematic

Current Environment Configuration

Verified Configuration Per Environment

Environment RuntimeEnvironment Saga Schema Quartz Prefix MongoDB
Production production production [production].QRTZ_ syrftest
Staging staging staging [staging].QRTZ_ syrf_staging
PR Previews preview preview [preview_{n}].QRTZ_ syrf_pr_{n}

Note: Quartz prefix is dynamically set in Program.cs based on syrfEnvironment.RuntimeEnvironment. For preview environments, if SYRF__PrNumber is set, the schema becomes preview_{prNumber}. Schemas and tables are automatically created at startup via QuartzSchemaInitializer.

Jobs Managed by Quartz

  1. Search Import Jobs - Literature search workflow orchestration
  2. Reference File Parsing - BibTeX, Endnote, PubMed parsing (5 min timeout, 5 concurrent)
  3. Risk of Bias Processing - AI-assisted RoB calculations (2 hour timeout, 3 concurrent)
  4. Bulk Study Updates - Batch study data imports
  5. Screening Stats Update - Agreement threshold calculations

Schema vs Database Isolation: Trade-offs Analysis

Why Use Same Database with Different Schemas?

Benefits of Schema-Based Isolation:

Benefit Description
Lower Cost Single database = single billing unit. No per-database overhead costs.
Simpler Operations Single backup strategy, single connection pool, unified monitoring
Easier Cross-Environment Queries Can join across schemas for debugging/reporting if needed
Atomic Schema Changes Migrations can be coordinated in a single transaction
Resource Efficiency Schemas share memory, cache, and compute resources

Trade-offs:

Concern Schema Isolation Separate Databases
Security Boundary Application-level (RBAC) Database-level (connection string)
Resource Contention Possible "noisy neighbor" Complete isolation
Compliance Requirements May not satisfy strict audits Better for SOC2/HIPAA
Independent Scaling Not possible Per-database performance tiers
Backup/Restore Granularity Whole database only Per-environment recovery

Clarification: Security Concern Details

The security concern in this document specifically refers to the Quartz native tables (QRTZ_*) which are NOT schema-isolated - they all use the hardcoded [development] schema prefix.

What IS properly secured (schema-isolated):

  • staging.JobSaga, production.JobSaga - Different schemas, proper isolation
  • With proper SQL Server RBAC, staging connection could be restricted to only staging.* objects

What is NOT secured (shared tables):

  • [development].QRTZ_JOB_DETAILS - Contains job metadata from ALL environments
  • [development].QRTZ_TRIGGERS - Trigger definitions from ALL environments
  • Any environment's Quartz instance can read/write these tables

Why This Matters:

  • Job parameters in QRTZ_JOB_DETAILS may contain business data (project IDs, user IDs, etc.)
  • A staging environment breach could expose production job scheduling patterns
  • No audit trail separation between environments

When Schema Isolation is Sufficient

According to Microsoft's multi-tenant SaaS patterns documentation, schema-based isolation is appropriate when:

  • Environments have similar trust levels (both internal)
  • No regulatory requirement for physical data separation
  • Cost optimization is a priority
  • Operational simplicity outweighs isolation benefits

For SyRF's Quartz use case, schema isolation is sufficient for staging vs production because:

  1. Both are internal environments managed by the same team
  2. No sensitive PII is stored in Quartz tables (just job metadata)
  3. The risk is operational (debugging confusion) rather than data breach

Azure SQL Pricing: Schemas vs Databases

Current Architecture Cost Model

SyRF currently uses a single Azure SQL Database (syrf-quartz), billed as one unit.

Cost Comparison

Based on Azure SQL Database pricing:

Approach Billing Model Approximate Monthly Cost*
Single DB, Multiple Schemas (current) 1x database ~$75-150 (S2/S3 tier)
Separate DBs, Same Server 3x database ~$225-450 (3x S2/S3)
Elastic Pool Pool resources ~$150-200 (100 eDTUs)

*Estimates based on General Purpose tier, Europe West region

Key Azure Pricing Facts

  1. Databases are billed individually - Creating 3 databases means 3x the per-database charges
  2. Schemas have zero additional cost - They're just logical namespaces within a database
  3. Elastic Pools can reduce multi-DB costs - Up to 13x reduction for databases with variable workloads
  4. Storage is per-database - Multiple databases duplicate base storage allocation

Recommendation: Elastic Pool for Multi-Database Approach

If choosing separate databases (Option 2), use an Azure SQL Elastic Pool:

Configuration Standard Elastic Pool
Pool Size 100 eDTUs
Max DBs 3 (staging, production, preview)
Approx. Monthly Cost ~$150-200
Savings vs 3 Single DBs ~40-60%

When Elastic Pool Makes Sense:

  • Databases have different peak usage times (staging used during dev, production at business hours)
  • Total combined utilization < pool capacity most of the time
  • At least 2-3 databases with similar resource needs

When to Avoid Elastic Pool:

  • All databases have sustained high utilization
  • Need serverless auto-pause (not supported in pools)
  • Databases need significantly different performance tiers

Decision Matrix

Criterion Schema Isolation Separate DBs Elastic Pool
Monthly Cost $ (lowest) $$$ $$
Security Boundary Application-level Connection-string Connection-string
Operational Complexity Low High Medium
Independent Scaling No Yes Limited
Backup Granularity Whole DB Per-environment Per-pool
Best For Internal envs, cost-sensitive Strict compliance Multiple DBs, variable load

For SyRF: Schema Isolation is Optimal

Given:

  • Internal-only environments (no external tenant isolation needed)
  • Low Quartz database load (mostly metadata, not high-throughput)
  • Cost-conscious infrastructure approach
  • Operational simplicity preference

Recommendation: Stay with single database, fix the table prefix (Option 1)



Appendix: Verification Commands

# Check environment variables in Quartz pods
kubectl get deployment syrf-quartz -n syrf-staging -o jsonpath='{.spec.template.spec.containers[0].env}' | jq '.[] | select(.name | test("Runtime|Environment"; "i"))'

# Verify current Quartz table prefix (from running pod)
kubectl exec -n syrf-staging deploy/syrf-quartz -- cat /app/appsettings.json | jq '.Quartz'

# Check which secrets are used
kubectl get externalsecret quartz-sql-server-credentials -n syrf-staging -o yaml | grep -A5 "remoteRef:"

# Verify saga schema in SQL Server (requires SQL access)
SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME IN ('staging', 'production', 'preview', 'development');