Technical Plan: Advanced Screening & Filtering¶
Audience: Chris (Senior Dev), Nuri (Junior Dev)
Scope: MVP-first, hybrid cloud aware. Incorporates: PRISMA implementation plan, detailed Filter Set model + Angular Material Filter Builder UI, MongoDB query/performance strategy, forward path to annotation-based filtering.
Key Design: No MassTransit consumer for tallies—materialised tallies are updated atomically inside the Study aggregate alongside screening/annotation writes.
Phasing & Constraints¶
- MVP (3–4 sprints): Screening Profiles (immutable-on-use + clone), Stage Settings (mode required), Filter Set v2 (nested groups in storage, simple UI), Selection & Stats, studies endpoint with
stageId, Reviewer UI wiring, opt-in Migration Wizard. - Hardening (1–2 sprints): a11y polish, perf/telemetry dashboards, helpdesk SOP.
- Phase-2: PRISMA diagram/export, annotation-based filtering, tie-breaker groups, optional materialised pools for very large projects.
- Hybrid infrastructure: Angular (Material + ngrx), ASP.NET API (.NET 10), MongoDB (Atlas/GKE), RabbitMQ present but not used for tallies; on-prem file server unchanged.
Architecture Overview¶
[ Angular SPA ]
├─ Stage Settings (mode required)
├─ Filter Builder (Material) — JSON v2
├─ Reviewer Screen (Screening/Annotation/Reconciliation)
└─ Stats widgets + Study table (Pool via stageId)
▼ REST/JSON
[ ASP.NET API (.NET 10, GKE) ]
├─ ProfilesController ── CRUD/Clone
├─ StagesController ── settings incl. FilterSet
├─ StudiesController ── GET studies?stageId=… (Stage Study Pool)
├─ SelectionController ── POST select_next, GET stats
├─ DecisionsController ── POST decisions (stage context)
├─ PrismaController (P2) ── GET prisma_data
└─ Domain services: FilterCompiler, SelectionService, StatsService,
ReconciliationService, PrismaAggregator, AuditService
[ MongoDB ] — Project, Study, Reviewing_Audit
• Project: screeningProfiles[], stages[] (filterSet + settings), prismaMapping
• Study: screeningOutcomes[], extractionInfo.sessions[],
extractionInfo.sessionTallies[], reconciledAnnotations{}
[ On-prem FT Files ] — unchanged; fetched via existing endpoints
Service Boundaries & Responsibilities¶
- ProfilesService — immutable-on-use; clone semantics; where-used lookup.
- StagesService — stage settings + FilterSet persistence (with schema validation).
- FilterCompiler — validate/simplify/compile FilterSet JSON → efficient MongoDB filter(s). Knows array-matching pitfalls and merges conditions per array path.
- SelectionService — derives Selection Subset from Stage Study Pool by mode; supports saved-session routing; random selection using index-friendly technique (see §6.5), not
$samplefor very large pools. - StatsService — fast counts by caller and stage; micro-cache allowed (in-memory/Redis optional).
- ReconciliationService — eligibility, self-reconciliation policy, commit reconciled annotations.
- AuditService — append-only reviewing_audit.
- PrismaAggregator (P2) — compute PRISMA metrics from Screening Outcomes + import metadata.
Consistency boundary: All per-study tallies (screening/annotation session tallies, InclusionInfo) are updated atomically with the Study document write.
Data Model (MongoDB)¶
Project¶
{
screeningProfiles: [
{ id, name, criteriaText, parentId?, createdBy, createdAt, used: bool }
],
stages: [
{ id, name, studySelectionMode, hideExcluded, maxInProgress,
sessionCountTarget, selfReconciliation, filterSet }
],
prismaMapping: { taProfileId?, ftProfileId?, sourceFields, notes } // Phase-2
}
Study¶
{
screeningOutcomes: [
{ profileId, decisions[], status, updatedAt }
],
extractionInfo: {
sessions: [
{ stageId, reviewerId, reconciliation: bool,
status: "InProgress"|"Completed", startedAt, completedAt? }
],
sessionTallies: [
{ stageId, numberOfCandidateSessions,
numberOfCompletedCandidateSessions,
numberOfReconciliationSessions? }
]
},
inclusionInfo: [], // optional materialised summary per profile/stage
reconciledAnnotations: { "<questionId>": "<value|array|object>" }, // Phase-2
rand: 0.0 // double in [0,1) for index-friendly random selection
}
Indexes (created at startup)¶
screeningOutcomes.profileId + screeningOutcomes.status(compound, multi-key)extractionInfo.sessionTallies.stageIdextractionInfo.sessions.{stageId, reviewerId, reconciliation, status}rand(ascending) for random-by-range selection- Phase-2: wildcard
reconciledAnnotations.$**(with partial and sparse strategies)
Filter Set v2 — Storage, Semantics, Compilation¶
JSON Schema (backward-compatible & future-proof)¶
{
"version": 2,
"logic": "AND",
"rules": [
{
"type": "group",
"logic": "AND",
"rules": [
{ "type": "profileOutcome", "profileId": "<guid>", "op": "in", "values": ["Included","Conflict","Maybe"] },
{ "type": "profileOutcome", "profileId": "<guidB>", "op": "notIn", "values": ["Included"] }
]
},
{ "type": "annotation", "questionId": "ft_reason", "op": "in", "values": ["WrongPopulation","Duplicate"] }
]
}
- Types:
profileOutcome(MVP),annotation(Phase-2). Additional future types (e.g.,importSource,studyTag) can slot in without breaking clients. - Nested groups from day one (UI may only allow simple cases in MVP).
C# Model & Validation¶
public enum NodeType { Group, ProfileOutcome, Annotation }
public enum Logic { And, Or }
public enum Op { In, NotIn, Eq, Neq, Any, All }
public abstract record Node(NodeType Type);
public sealed record GroupNode(Logic Logic, IReadOnlyList<Node> Rules) : Node(NodeType.Group);
public sealed record ProfileOutcomeNode(string ProfileId, Op Op, IReadOnlyList<string> Values) : Node(NodeType.ProfileOutcome);
public sealed record AnnotationNode(string QuestionId, Op Op, IReadOnlyList<string> Values) : Node(NodeType.Annotation);
public static class FilterValidator {
public static void Validate(Node n) {
// validate ids, enum values, non-empty rules,
// detect circular references if we later allow profile->stage edges
}
}
Simplifier — Make Queries Cheaper (Idempotent)¶
simplify(node):
if node is Group(AND/OR):
node.rules = [simplify(r) for r in node.rules]
// Flatten nested groups with same logic
flatten(node)
// Merge ProfileOutcome rules targeting SAME profileId
// AND + (in A) + (in B) → in (A ∩ B)
// AND + (in A) + (notIn B)→ in (A − B); if empty → FALSE
// OR + (in A) + (in B) → in (A ∪ B)
// Remove tautologies and contradictions
// Drop empty groups; if group becomes empty:
// AND → TRUE; OR → FALSE (apply identities carefully)
return node
Why: MongoDB $elemMatch can't enforce same element across separate $elemMatch stages; by merging rules per profileId we avoid incorrect matches and reduce pipeline stages.
Compiler — Array-Aware and Index-Friendly¶
FilterDefinition<BsonDocument> Compile(Node n) {
var f = Builders<BsonDocument>.Filter;
return n switch {
GroupNode g => g.Logic == Logic.And
? f.And(g.Rules.Select(Compile))
: f.Or(g.Rules.Select(Compile)),
ProfileOutcomeNode p => f.ElemMatch("screeningOutcomes", f.And(
f.Eq("profileId", p.ProfileId),
p.Op switch {
Op.In => f.In("status", p.Values),
Op.NotIn => f.Nin("status", p.Values),
_ => throw new NotSupportedException()
}
)),
AnnotationNode a => a.Op switch {
Op.In => f.In($"reconciledAnnotations.{a.QuestionId}", a.Values),
Op.NotIn => f.Nin($"reconciledAnnotations.{a.QuestionId}", a.Values),
Op.Any => f.Exists($"reconciledAnnotations.{a.QuestionId}"),
Op.All => f.All($"reconciledAnnotations.{a.QuestionId}", a.Values),
_ => throw new NotSupportedException()
},
_ => throw new NotSupportedException()
};
}
Bottlenecks & Mitigations¶
- B1: Many profiles across a huge corpus → large multi-key fan-out.
- Mitigate: pre-filter by
profileIdwith high selectivity; ensure compound index{ profileId, status }. - B2: Deep OR trees lead to index intersection and memory pressure.
- Mitigate: simplifier flattens/merges; push down most selective branches first.
- B3:
$sampleon large matched sets is CPU heavy. - Mitigate: random-by-range using
randfield and range query with wrap-around. - B4: Annotation value cardinality (Phase-2) → poor selectivity for free-text.
- Mitigate: restrict to enumerated codes; use flattened
reconciledAnnotations.<qid>fields.
Angular Material Filter Builder — UI/State/Contracts¶
UX Constraints¶
- MVP exposes one pass-forward rule (Profile + outcomes), but backend stores full v2 schema.
- Live count preview (debounced) via
GET studies?stageId=…&countOnly=true. - Clear circular reference errors at save.
Components (Angular 18+/Material)¶
<app-filter-builder>
<app-group [logic]="AND">
<app-rule type="profileOutcome"></app-rule>
<!-- Future: nested groups; annotation rules -->
</app-group>
<mat-divider></mat-divider>
<div class="preview">
<span>Matches: {{count$ | async}}</span>
<button mat-stroked-button (click)="reset()">Reset</button>
</div>
</app-filter-builder>
- Material:
mat-form-field,mat-selectfor profile/outcome pickers;mat-button-toggle-groupfor AND/OR;cdkDragDropfor reordering;mat-treeoptional for nested groups. - State: ngrx store for project/stage/global; signals for component local state & derived values.
Reactive Forms + Signals¶
const ruleForm = this.fb.group({
type: this.fb.control<'profileOutcome'|'annotation'>('profileOutcome', { nonNullable: true }),
profileId: this.fb.control<string | null>(null),
op: this.fb.control<'in'|'notIn'>('in', { nonNullable: true }),
values: this.fb.control<string[]>([], { nonNullable: true })
});
// derive JSON (signal)
readonly filterJson = computed(() => serializeToJsonV2(this.rootGroup()));
// preview count (debounced) — convert signal to observable for RxJS operators
readonly count$ = toObservable(this.filterJson).pipe(
debounceTime(300),
switchMap(json => this.api.getPoolCount(projectId, stageId, json))
);
Selection — Efficient, Fair, and Scalable¶
Modes & Policies¶
screening,annotation,screeningAndAnnotation,reconciliation- Apply per-reviewer suppression and
hideExcludedwhere relevant - Saved-session routing when
restrictToSavedormaxInProgressreached
Random Selection: Index-Friendly Approach¶
Instead of $sample(1) on large sets, use a precomputed rand field ∈ [0,1) with an index:
r = random()
q1: match(candidates & rand >= r) sort(rand ASC) limit 1
if none: q2: match(candidates & rand < r) sort(rand ASC) limit 1
- Pros: uses an index; avoids collection scans; stable distribution.
- Refresh
randrarely (e.g., when creating a study).
Selection Filter Build (C#)¶
var pool = FilterCompiler.Compile(stage.FilterSet);
var candidates = builder.And(builder.Eq("projectId", projectId), pool);
if (mode is Screening or ScreeningAndAnnotation && stage.HideExcluded) {
var myExcluded = builder.ElemMatch("screeningOutcomes", builder.And(
builder.Eq("profileId", stage.ActiveProfileId),
builder.ElemMatch("decisions", builder.And(
builder.Eq("reviewerId", callerId), builder.Eq("outcome", "Excluded")
))));
candidates &= !myExcluded;
}
if (mode == Reconciliation) {
candidates &= builder.ElemMatch("extractionInfo.sessionTallies",
builder.And(builder.Eq("stageId", stage.Id),
builder.Gte("numberOfCandidateSessions", stage.SessionCountTarget)));
if (!stage.SelfReconciliation) {
var mine = builder.ElemMatch("extractionInfo.sessions",
builder.And(builder.Eq("stageId", stage.Id),
builder.Eq("reviewerId", callerId),
builder.Eq("reconciliation", false)));
candidates &= !mine;
}
}
// random-by-range
var result = await _studies.Find(candidates)
.SortBy(x => x["rand"])
.FirstOrDefaultAsync(ct);
API Contracts (Illustrative)¶
Stage Study Pool¶
Returns reviewer-agnostic Stage Study Pool IDs or records.
Selection¶
POST /api/projects/{projectId}/stages/{stageId}/select_next
Body: { mode: "screening" | "annotation" | "screeningAndAnnotation" | "reconciliation", restrictToSaved?: boolean }
Response: 200 Study | 204 No Content
Stats¶
GET /api/projects/{projectId}/stages/{stageId}/stats
Response: {
availableForScreening: number,
availableForAnnotation: number,
reconciliationEligible: number,
inProgress: number,
completed: number,
reconciliationInProgress: number
}
Decisions¶
POST /api/projects/{projectId}/stages/{stageId}/screening/decisions
Body: { outcome: "Included|Excluded|Conflict|Pending|Maybe?", notes?, ... }
Server infers profileId from stage.
Testing Strategy¶
Unit Tests¶
- FilterValidator/Simplifier/Compiler
- Selection eligibility per mode
- Reconciliation policy
- Decision status computation
- PRISMA aggregator math (Phase-2)
Integration Tests (API+Mongo)¶
/select_nextbehaviours/statscountsstudies?stageIdpool correctness- FilterSet round-trip
- Decision write atomic updates (tallies & inclusion info)
Contract Tests¶
- DTO parity (TypeScript vs C#)
- Versioned schemas
E2E Tests¶
- Stage creation (mode required)
- Filter Builder (preview counts)
- Reviewer flows (all modes)
- Migration Wizard
- PRISMA mapping & preview (Phase-2)
Performance Tests¶
- Selection with
randvs$sample - OR-heavy filters
- Annotation filters on popular questions (Phase-2)
Property-Based Tests¶
- Simplifier equivalence (random trees → compile → run vs naive eval on sample set)
Deployment & Env Config¶
- Feature flag:
features.advancedScreeningProfilesper project - Indexes job: ensure all indexes on boot
- Config:
selection.random.method = randRange(fallback$samplefor tiny pools) - K8s: HPA on p95 latency & CPU; probes at
/healthz&/livez - CI/CD: build → unit/integration → deploy Dev → smoke → UAT → Prod; canary under feature flag
Migration & Rollback¶
Migration Wizard (opt-in)¶
- Freeze: set
project.migrationStatus = Freezing; block review actions - Snapshot: copy project doc + study IDs into
migrationSnapshots - Backfill: create first Screening Profile from legacy criteria text; sweep studies to derive initial outcomes
- Verify: counts match; sample QA; write audit log
- Unfreeze: set
project.migrationStatus = Completeand enable feature
Rollback¶
- If any step fails, set
migrationStatus = Failed; offer Revert which restores snapshot - Clear partial writes with job that deletes new fields where safe
Note: Reasons for exclusion not backfilled; consider post-hoc annotation pass in Phase-2.
Security & Access Control¶
- Admin-only create/edit Profiles/FilterSets/Stage Settings/PRISMA mapping
- Reviewers can view criteria text on review UI (read-only)
- Audit all decisions, session changes, reconciled commits (reviewing_audit)
Developer Checklist¶
Backend Tasks¶
- Create Screening Profile aggregate under Project
- Implement immutable-once-used + clone semantics
- Add Stage Settings fields (mode, policies)
- Implement Filter Set v2 storage + validation
- Build FilterCompiler (validate/simplify/compile)
- Create SelectionService with modes
- Implement StatsService with per-caller counts
- Add
stageIdparam to studies endpoint - Create decisions route with stage context
- Implement Migration Wizard (freeze/snapshot/backfill/rollback)
- Add audit entries for all actions
- Create MongoDB indexes
Frontend Tasks¶
- Stage Settings UI (mode required)
- Filter Builder component (MVP: single pass-forward rule)
- Live count preview (debounced)
- Wire "Get next" by mode
- Show active profile criteria on review UI
- Add mode banners (Reconciliation Mode)
- Implement keyboard shortcuts (J/K, ½/3)
- Empty-state guidance
- Saved-session resume flows
- Stats widgets
Documentation Tasks¶
- Update API documentation
- Create user guide for new features
- Helpdesk SOP export bundle
Sprint Timeline (Indicative)¶
- Sprint 1: Profiles domain/API; Stage Settings (mode required); ensure indexes; decisions write atomic tallies
- Sprint 2: FilterSet storage (v2), Simplifier + Compiler; MVP Filter Builder; studies?stageId pool
- Sprint 3: Selection (rand strategy) + Stats; Reviewer UI wiring; telemetry
- Sprint 4: Migration Wizard; E2E hardening
- Sprint 5 (Phase-2 start): PRISMA mapping + aggregator; annotation filtering storage; wildcard/partial indexes
Open Questions (Carried)¶
- Should
HideExcludedStudiesFromReviewersapply in Reconciliation Mode? - PRISMA box breakdowns beyond TA/FT (e.g., multiple phases)?