Quartz Service Data Isolation Analysis¶
Executive Summary¶
Current State: Quartz data is fully isolated between environments. Both MassTransit saga tables and native Quartz scheduler tables now use environment-specific SQL schemas.
Implementation Status: ✅ COMPLETED (2025-12-30)
The isolation was implemented by:
- Adding dynamic
TablePrefixin Program.cs usingsyrfEnvironment.RuntimeEnvironment - Removing hardcoded prefix from appsettings.json
- Creating SQL migration script for new schemas (
staging,production,preview)
See Migration Script for database setup.
Previous State (Before Fix)¶
Risk Level: MEDIUM-HIGH - The native Quartz scheduler tables were shared across ALL environments due to a hardcoded table prefix, creating potential for:
- Job interference between environments
- Debugging complexity
- Security concerns with production/staging sharing tables
- All PR previews competing for the same job slots
Current Architecture¶
Database Connection¶
All environments connect to the same Azure SQL Server database:
| Component | Value |
|---|---|
| Server | syrf.database.windows.net |
| Database | syrf-quartz |
| GCP Secret | camarades-quartz-sql-server-credentials |
Both syrf-staging and syrf-production namespaces use the same GCP secret, meaning they connect to the identical database instance.
Schema Isolation (MassTransit Saga Tables)¶
The MassTransit job saga tables ARE properly isolated using environment-specific schemas:
| Environment | Schema Name | Tables |
|---|---|---|
| Production | production |
production.JobSaga, production.JobTypeSaga, production.JobAttemptSaga |
| Staging | staging |
staging.JobSaga, staging.JobTypeSaga, staging.JobAttemptSaga |
| PR Previews | preview |
preview.JobSaga, preview.JobTypeSaga, preview.JobAttemptSaga |
How it works: The SYRF__RuntimeEnvironment environment variable (set per namespace) determines the schema name at runtime via SyrfEnvironment.RuntimeEnvironment.
// Program.cs line 29
var syrfEnvironment = new SyrfEnvironment(runtimeEnvironment);
// Used in EF Core migration
m.MigrationsHistoryTable($"__{nameof(JobServiceSagaOverrideDbContext)}", syrfEnvironment.RuntimeEnvironment);
Quartz Native Tables (NOW ISOLATED ✅)¶
RESOLVED: The native Quartz scheduler tables are now properly isolated between environments.
The table prefix is dynamically set in Program.cs:
q.UsePersistentStore(s =>
{
s.TablePrefix = $"[{syrfEnvironment.RuntimeEnvironment}].QRTZ_";
// ...
});
Each environment now uses its own set of tables:
| Environment | Tables |
|---|---|
| Production | [production].QRTZ_JOB_DETAILS, [production].QRTZ_TRIGGERS, etc. |
| Staging | [staging].QRTZ_JOB_DETAILS, [staging].QRTZ_TRIGGERS, etc. |
| Preview | [preview].QRTZ_JOB_DETAILS, [preview].QRTZ_TRIGGERS, etc. |
What Data is Affected¶
Properly Isolated (Per Environment Schema)¶
| Table | Purpose | Isolation |
|---|---|---|
JobSaga |
MassTransit job lifecycle state (submitted, started, completed, faulted) | Schema-based |
JobTypeSaga |
Job type concurrency limits and active job counts | Schema-based |
JobAttemptSaga |
Individual retry attempts with status | Schema-based |
__JobServiceSagaOverrideDbContext |
EF Core migration history | Schema-based |
Now Isolated (Per Environment Schema) ✅¶
| Table | Purpose | Status |
|---|---|---|
QRTZ_JOB_DETAILS |
Registered job definitions | ✅ Isolated per environment |
QRTZ_TRIGGERS |
Trigger definitions for scheduled jobs | ✅ Isolated per environment |
QRTZ_FIRED_TRIGGERS |
Currently executing triggers | ✅ Isolated per environment |
QRTZ_CRON_TRIGGERS |
Cron-based schedules | ✅ Isolated per environment |
QRTZ_LOCKS |
Scheduler cluster locks | ✅ Isolated per environment |
QRTZ_SCHEDULER_STATE |
Scheduler instance tracking | ✅ Isolated per environment |
Previous Implications (Now Resolved)¶
Note: These issues have been resolved with the implementation of environment-specific table prefixes.
1. Job Interference (RESOLVED ✅)¶
Quartz schedulers from different environments register as nodes in the same "cluster".
Now each environment has its own set of QRTZ_* tables, so:
- Trigger definitions are isolated per environment
- Job group names only need to be unique within an environment
- Scheduler state queries return data only from that environment
2. Security Concerns (RESOLVED ✅)¶
Staging environment has access to production job metadata.
Now each environment has complete isolation:
- Staging cannot see production job metadata
- PR preview environments are isolated from production
- Job parameters in
QRTZ_JOB_DETAILSare environment-specific
3. Debugging Complexity (RESOLVED ✅)¶
When investigating job issues, data from all environments is intermixed.
Now debugging is cleaner:
- Each environment's scheduler state is isolated
- Lock contention metrics are per-environment
- Node tracking is separated by environment
4. PR Preview Impact (RESOLVED ✅)¶
Each PR preview environment now gets its own isolated schema:
- Schema naming:
preview_{prNumber}(e.g.,preview_2247) - Complete isolation between PR environments
- No job interference between PRs
How it works: The SYRF__PrNumber environment variable (set by Helm for preview deployments) is used to create per-PR schemas. If a PR number is available in a preview environment, the schema becomes preview_{prNumber} instead of just preview.
Automatic Cleanup: When a PR is closed, the cleanup-tags job in .github/workflows/pr-preview.yml automatically drops the corresponding preview_{prNumber} schema and all its QRTZ_* tables from the SQL database.
Recommendations¶
Option 1: Environment-Specific Table Prefix (Recommended)¶
Effort: Low | Risk: Low | Impact: High
Add environment-specific Quartz table prefix via Helm values:
# In _env-blocks.tpl, add to quartz env block:
- name: SYRF__Quartz__quartz.jobStore.tablePrefix
value: "[{{ .Values.environment.name }}].QRTZ_"
Pros:
- Simple configuration change
- No data migration required (new schema, existing remains untouched)
- Each environment gets its own Quartz tables
- Backward compatible
Cons:
- Need to create new QRTZ_* tables for each environment
- PR previews still share
previewprefix (but isolated from staging/production)
Implementation Steps:
- Add
quartz.jobStore.tablePrefixto Helm templates - Create QRTZ_* tables in
stagingandproductionschemas - Deploy Quartz with new config (will auto-create tables if missing)
- Optionally clean up old
[development].QRTZ_*tables
Option 2: Separate Database Per Environment¶
Effort: Medium | Risk: Medium | Impact: Highest
Create environment-specific SQL databases:
syrf-quartz-stagingsyrf-quartz-productionsyrf-quartz-preview
Pros:
- Complete isolation at database level
- Clearer security boundaries
- Independent backups and recovery
- Can use different performance tiers
Cons:
- Requires new GCP secrets per environment
- Azure SQL billing increase (3x databases)
- Migration effort for existing data
- More infrastructure to manage
Option 3: PR-Specific Preview Isolation¶
Effort: Medium | Risk: Low | Impact: Medium
Extend schema isolation to use PR number for previews:
Pros:
- Each PR gets fully isolated job storage
- No interference between PR previews
- Better debugging experience
Cons:
- Requires PR number injection into Helm values
- More schemas to manage
- Need cleanup job for orphaned preview schemas
Option 4: Disable Quartz in Previews¶
Effort: Low | Risk: Low | Impact: Low
For PR previews, disable the persistent job store:
Pros:
- No database pollution from PRs
- Simpler preview infrastructure
- PRs don't need SQL Server access
Cons:
- Jobs don't survive pod restarts in previews
- Limited testing of scheduling functionality
- May miss scheduling-related bugs
Priority Recommendation¶
Immediate (Option 1): Implement environment-specific table prefix
- Fixes the staging/production sharing issue
- Low risk, high value
Short-term (Option 4): Disable persistent store for previews
- Reduces database pollution
- Simplifies preview infrastructure
Future consideration (Option 3): PR-specific isolation
- Only if debugging across PRs becomes problematic
Current Environment Configuration¶
Verified Configuration Per Environment¶
| Environment | RuntimeEnvironment | Saga Schema | Quartz Prefix | MongoDB |
|---|---|---|---|---|
| Production | production |
production |
[production].QRTZ_ |
syrftest |
| Staging | staging |
staging |
[staging].QRTZ_ |
syrf_staging |
| PR Previews | preview |
preview |
[preview_{n}].QRTZ_ |
syrf_pr_{n} |
Note: Quartz prefix is dynamically set in Program.cs based on
syrfEnvironment.RuntimeEnvironment. For preview environments, ifSYRF__PrNumberis set, the schema becomespreview_{prNumber}. Schemas and tables are automatically created at startup viaQuartzSchemaInitializer.
Jobs Managed by Quartz¶
- Search Import Jobs - Literature search workflow orchestration
- Reference File Parsing - BibTeX, Endnote, PubMed parsing (5 min timeout, 5 concurrent)
- Risk of Bias Processing - AI-assisted RoB calculations (2 hour timeout, 3 concurrent)
- Bulk Study Updates - Batch study data imports
- Screening Stats Update - Agreement threshold calculations
Schema vs Database Isolation: Trade-offs Analysis¶
Why Use Same Database with Different Schemas?¶
Benefits of Schema-Based Isolation:
| Benefit | Description |
|---|---|
| Lower Cost | Single database = single billing unit. No per-database overhead costs. |
| Simpler Operations | Single backup strategy, single connection pool, unified monitoring |
| Easier Cross-Environment Queries | Can join across schemas for debugging/reporting if needed |
| Atomic Schema Changes | Migrations can be coordinated in a single transaction |
| Resource Efficiency | Schemas share memory, cache, and compute resources |
Trade-offs:
| Concern | Schema Isolation | Separate Databases |
|---|---|---|
| Security Boundary | Application-level (RBAC) | Database-level (connection string) |
| Resource Contention | Possible "noisy neighbor" | Complete isolation |
| Compliance Requirements | May not satisfy strict audits | Better for SOC2/HIPAA |
| Independent Scaling | Not possible | Per-database performance tiers |
| Backup/Restore Granularity | Whole database only | Per-environment recovery |
Clarification: Security Concern Details¶
The security concern in this document specifically refers to the Quartz native tables (QRTZ_*) which are NOT schema-isolated - they all use the hardcoded [development] schema prefix.
What IS properly secured (schema-isolated):
staging.JobSaga,production.JobSaga- Different schemas, proper isolation- With proper SQL Server RBAC, staging connection could be restricted to only
staging.*objects
What is NOT secured (shared tables):
[development].QRTZ_JOB_DETAILS- Contains job metadata from ALL environments[development].QRTZ_TRIGGERS- Trigger definitions from ALL environments- Any environment's Quartz instance can read/write these tables
Why This Matters:
- Job parameters in
QRTZ_JOB_DETAILSmay contain business data (project IDs, user IDs, etc.) - A staging environment breach could expose production job scheduling patterns
- No audit trail separation between environments
When Schema Isolation is Sufficient¶
According to Microsoft's multi-tenant SaaS patterns documentation, schema-based isolation is appropriate when:
- Environments have similar trust levels (both internal)
- No regulatory requirement for physical data separation
- Cost optimization is a priority
- Operational simplicity outweighs isolation benefits
For SyRF's Quartz use case, schema isolation is sufficient for staging vs production because:
- Both are internal environments managed by the same team
- No sensitive PII is stored in Quartz tables (just job metadata)
- The risk is operational (debugging confusion) rather than data breach
Azure SQL Pricing: Schemas vs Databases¶
Current Architecture Cost Model¶
SyRF currently uses a single Azure SQL Database (syrf-quartz), billed as one unit.
Cost Comparison¶
Based on Azure SQL Database pricing:
| Approach | Billing Model | Approximate Monthly Cost* |
|---|---|---|
| Single DB, Multiple Schemas (current) | 1x database | ~$75-150 (S2/S3 tier) |
| Separate DBs, Same Server | 3x database | ~$225-450 (3x S2/S3) |
| Elastic Pool | Pool resources | ~$150-200 (100 eDTUs) |
*Estimates based on General Purpose tier, Europe West region
Key Azure Pricing Facts¶
- Databases are billed individually - Creating 3 databases means 3x the per-database charges
- Schemas have zero additional cost - They're just logical namespaces within a database
- Elastic Pools can reduce multi-DB costs - Up to 13x reduction for databases with variable workloads
- Storage is per-database - Multiple databases duplicate base storage allocation
Recommendation: Elastic Pool for Multi-Database Approach¶
If choosing separate databases (Option 2), use an Azure SQL Elastic Pool:
| Configuration | Standard Elastic Pool |
|---|---|
| Pool Size | 100 eDTUs |
| Max DBs | 3 (staging, production, preview) |
| Approx. Monthly Cost | ~$150-200 |
| Savings vs 3 Single DBs | ~40-60% |
When Elastic Pool Makes Sense:
- Databases have different peak usage times (staging used during dev, production at business hours)
- Total combined utilization < pool capacity most of the time
- At least 2-3 databases with similar resource needs
When to Avoid Elastic Pool:
- All databases have sustained high utilization
- Need serverless auto-pause (not supported in pools)
- Databases need significantly different performance tiers
Decision Matrix¶
| Criterion | Schema Isolation | Separate DBs | Elastic Pool |
|---|---|---|---|
| Monthly Cost | $ (lowest) | $$$ | $$ |
| Security Boundary | Application-level | Connection-string | Connection-string |
| Operational Complexity | Low | High | Medium |
| Independent Scaling | No | Yes | Limited |
| Backup Granularity | Whole DB | Per-environment | Per-pool |
| Best For | Internal envs, cost-sensitive | Strict compliance | Multiple DBs, variable load |
For SyRF: Schema Isolation is Optimal¶
Given:
- Internal-only environments (no external tenant isolation needed)
- Low Quartz database load (mostly metadata, not high-throughput)
- Cost-conscious infrastructure approach
- Operational simplicity preference
Recommendation: Stay with single database, fix the table prefix (Option 1)
Related Documentation¶
- MongoDB Reference - Database architecture for MongoDB collections
- Dependency Map - Service dependencies including Quartz
- ADR-003: Cluster Architecture - Overall cluster design
Appendix: Verification Commands¶
# Check environment variables in Quartz pods
kubectl get deployment syrf-quartz -n syrf-staging -o jsonpath='{.spec.template.spec.containers[0].env}' | jq '.[] | select(.name | test("Runtime|Environment"; "i"))'
# Verify current Quartz table prefix (from running pod)
kubectl exec -n syrf-staging deploy/syrf-quartz -- cat /app/appsettings.json | jq '.Quartz'
# Check which secrets are used
kubectl get externalsecret quartz-sql-server-credentials -n syrf-staging -o yaml | grep -A5 "remoteRef:"
# Verify saga schema in SQL Server (requires SQL access)
SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME IN ('staging', 'production', 'preview', 'development');