Investigation - little DOMS Data Export Failure and Annotation Corruption¶

Project: little DOMS (ac35058c-4767-4f9a-9ddc-5ba9d484c289) Issues: #2335, #2337, #2339 PR: #2377 Date: 2026-03-07 Status: Root cause identified, fixes proposed

TL;DR¶

Two separate bugs are affecting this project:

Wide format data export crashes because one StringArrayAnnotation has a null Answer list, which causes string.Join(delimiter, null) to throw ArgumentNullException. Secondary issue: BoolAnnotation.GetAnswer() returns string? (nullable), violating the base class contract. Fix: null guards on all annotation GetAnswer() methods.
Annotations are destroyed when a user re-saves Stage 1 after completing Stage 2. Annotations are shared across all stages (not scoped per stage), so tree-shaking correctly overwrites shared-question answers — this is by design. Study-level shared questions (Items 1-5, 9, 10) round-tripped perfectly. Only per-outcome sub-questions (Items 6, 7, 8) diverged or were lost — these are children of procedural unit instances (outcome assessment procedures) linked via AA (answer-answer) relationships. The likely cause is that re-saving Stage 1 breaks AA relationships when parent procedural annotations are re-created with new IDs, orphaning the child annotations. Fix requires investigating AA relationship preservation during form re-submission.

Both bugs are latent platform-wide issues — this project triggered them first due to its multi-stage workflow and one investigator's usage pattern.

Investigator Accounts¶

Althea Tomas: althea.tomas@gmail.com — CSUUID F5XWaUIex06j3KycXnpSUA==
Maria Arroyo Araujo: maria.arroyo@charite.de — CSUUID xr22lv0TQESE9BNC69axeA==

Project Configuration¶

Created: 2021-02-02

Stages¶

#	Name	Extraction	Questions
1	Data extraction	`true`	128
2	Risk of bias assessment	`false`	18 (17 shared with Stage 1)

Stage 1 ID: 4jYrw5Vwq0+u1chtjeKHLA==
Stage 2 ID: BHKTi2kyJ0y2MJsA0kfdcA==

User Reports¶

#2335 — Althea Tomas: Data export error (12 Feb 2026)¶

TL;DR: Wide format annotation export fails with ArgumentNullException after processing 56 of 323 studies.

Reporter: althea.tomas@gmail.com
Action: Export annotations in wide format from "little DOMS" project
Result: Export fails with error: Critical export failure: Value cannot be null. (Parameter 'values')
Scope: Affects all project members attempting the same export
Stack trace root: WideFormatOptions.cs:line 76 → StudyAnnotationsGroup → String.Join receives null

#2337 — Althea Tomas: Annotations not showing (19 Feb 2026)¶

TL;DR: Completed outcome-level annotations have vanished from the UI for multiple studies.

Reporter: Althea Mara Balota Tomas
Action: Returned to view previously completed annotations
Result: Outcome assessment level inputs have disappeared across all annotated studies
Affected study IDs:
2134c6d1-40a0-437e-bd4e-5a19fe303f62
b524fe36-d823-4b6f-96ff-cef2b7e4433f
f8288d5d-4c2c-4e67-86d6-3415b6bc7f6d
d95eccdf-da2f-4704-a124-2b0faaad283e
dbf0c84c-7db7-4dba-8377-f7bfa9419111

#2339 — Maria Arroyo Araujo: Export error & annotations corrupted (20 Feb 2026)¶

TL;DR: Same two problems — export crashes in wide format AND annotations get corrupted when finalising Risk of Bias stage.

Reporter: Arroyo Araujo, Maria
Action: (1) Export study-level annotations in wide format; (2) Finalise RoB assessment stage
Result: (1) Same ArgumentNullException crash; (2) Completing RoB stage corrupts outcome-level annotations in the Data Extraction stage
Additional affected study: 293f69be-f9d2-4955-b70d-e337b0e10873
Key observation: Corruption happens specifically when finalising annotations in the "Risk of bias assessment" stage — the already-completed annotations in the "Data extraction" stage lose their outcome-level data

Chronology¶

2021-02-02  Project "little DOMS" created

2026-02-03  Althea starts Stage 1 (Data Extraction) on NLRP3 and Low-dose nifedipine
2026-02-05  Althea starts Stage 1 on Contralesional angiotensin and Huang-Lian-Jie-Du
2026-02-09  Althea starts Stage 1 on New alternative approaches and DPP-4 Linagliptin
            -> At this point Althea has ~144-281 annotations per study saved under Stage 1

2026-02-12  * Althea files #2335 — wide format export crashes with ArgumentNullException
            (null StringArrayAnnotation — separate bug, latent since ~2021)

2026-02-18  Althea completes Stage 2 (RoB) for 5 studies within the same evening:
              18:17 — Contralesional angiotensin Stage 2 session created (completed)
              19:01 — New alternative approaches Stage 2 session created (completed)
              19:11 — NLRP3 Stage 2 session created (completed)
              19:20 — Low-dose nifedipine Stage 2 session created (completed)
              19:31 — DPP-4 Linagliptin Stage 2 session created (completed)
            -> Stage 2 submission tree-shakes Stage 1's 17 shared-question annotations
               and replaces them with Stage 2 RoB answers

            Maria starts Stage 1:
              14:38 — ESO-WSO 2020 (completed, excluded study, 32 annotations)
              15:19 — Huang-Lian-Jie-Du session created (annotations written later)

2026-02-19  Althea completes Stage 2 for Huang-Lian-Jie-Du (12:01)
            * Althea files #2337 — "Outcome assessment level inputs have disappeared"
              -> She completed Stage 2 yesterday/today and the outcome-level annotations
                 are now missing when she views Stage 2

            Maria starts Stage 1 on Carvedilol (15:32) — still in-progress today

2026-02-20  * Maria files #2339 — "same export crash + completing RoB corrupts outcome
              annotations in Data Extraction"
              -> Maria has no Stage 2 sessions: she is either reporting what she observed
                 happening to the project, or tried Stage 2 but it didn't save a session

2026-02-24  Maria annotates three more Stage 1 studies:
              12:19 — Huang-Lian-Jie-Du annotations written (session from 18 Feb)
              12:35 — FGF21 session created + 40 annotations (excluded)
              12:50 — Lauric acid session + 40 annotations (excluded)
              14:08 — Contralesional angiotensin session created

2026-02-25  Maria writes annotations for Contralesional angiotensin at 14:55 (133 total)

            ** Althea re-saves Stage 1 for 3 studies — ALL annotations completely
               rewritten in single submissions (Stage 1 form, all 128 questions):
              16:11 — Contralesional angiotensin: 268 annotations rewritten
              17:31 — NLRP3: 144 annotations rewritten
              17:51 — Low-dose nifedipine: 186 annotations rewritten

2026-02-26  Maria writes New alternative approaches annotations (272 total)

2026-03-05  ** Althea re-saves Stage 1 for remaining 3 studies:
              10:29 — Huang-Lian-Jie-Du: 145 annotations rewritten
              10:42 — New alternative approaches: 281 annotations rewritten
              10:59 — DPP-4 Linagliptin: 245 annotations rewritten

Bug 1: Wide Format Export Crash¶

TL;DR¶

One StringArrayAnnotation has a null Answer list. When the export calls string.Join(";", null) inside that annotation's GetAnswer() method, it throws ArgumentNullException. Secondary issue: BoolAnnotation.GetAnswer() returns string?, violating the base class string return type contract.

Symptoms¶

Wide format export fails after processing ~56 studies
Error: ArgumentNullException: Value cannot be null. (Parameter 'values')
Only affects wide format (long format and bibliographic exports work)

Stack Trace¶

at System.String.Join(String separator, IEnumerable`1 values)        ← CRASH: StringArrayAnnotation.GetAnswer() calls Join on null list
at System.Linq.Enumerable.SelectIListIterator`2.MoveNext()           ← iterating annotations in group
at System.String.Join(String separator, IEnumerable`1 values)        ← outer: answer cache builder
at System.Linq.Enumerable.ToDictionary(...)                          ← StudyAnnotationsGroup._answerCache
at WideFormatOptions.<>c__DisplayClass11_0.b__4(IGrouping grouping)  ← WideFormatOptions.cs:76

Root Cause¶

Direct crash: One StringArrayAnnotation for question "Pdf Graphs" on study "High-fructose diet during adolescent development..." has a null Answer list in MongoDB. When the export processes this study, StringArrayAnnotation.GetAnswer() calls string.Join(arrayDelimiter, Answer) where Answer is null, throwing ArgumentNullException("values").

File: src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/Annotation.cs

Array annotation types have List<T> Answer { get; private set; } = new(); — the = new() initialiser is overridden by MongoDB deserialization when the BSON field is null:

// All array types — no null guard on Answer
public override string GetAnswer(string arrayDelimiter = ";") => string.Join(arrayDelimiter, Answer);
// If Answer is null (from MongoDB) → ArgumentNullException("values")

Secondary issue: BoolAnnotation.GetAnswer() (line 73) returns string? (nullable), unlike all other scalar annotation types:

// BoolAnnotation — CAN return null if Answer (bool?) is null
public override string? GetAnswer(string arrayDelimiter) => Answer?.ToString();

// Compare: DecimalAnnotation — correct pattern with null coalescing
public override string GetAnswer(string arrayDelimiter = ";") => Answer?.ToString() ?? string.Empty;

In this project, all 23,697 BoolAnnotations have non-null Answer values (false: 16,205, true: 7,492), so this doesn't trigger here. However, it's a contract violation (GetAnswer base class declares string, not string?) and could cause NullReferenceException in other code paths or projects where bool? Answer is null.

Crash site: StudyAnnotationsGroup._answerCache materialises all annotations:

File: src/libs/project-management/SyRF.ProjectManagement.Core/Services/DataExportServices/AnnotationUnitGroups/StudyAnnotationsGroup.cs (lines 10-12)

private readonly IReadOnlyDictionary<Guid, string> _answerCache = GroupAnnotations
    .GroupBy(a => a.QuestionId)
    .ToDictionary(g => g.Key, g => string.Join(";", g.Select(a => a.GetAnswer(";"))));

MongoDB Evidence¶

Annotation data across all 323 studies in this project:

Type	Total	Null Answer	Notes
StringAnnotation	64,254	19	Null-coalesced to `""` in `GetAnswer()` — safe
BoolAnnotation	23,697	0	All have values (false: 16,205, true: 7,492)
IntAnnotation	13,661	0	All have values
StringArrayAnnotation	9,246	1	Crash trigger — null list passed to `String.Join`
DecimalAnnotation	8,278	0	All have values

The single null StringArrayAnnotation is the direct and only crash trigger in this project.

How Did the Null Annotation Get Created?¶

The "Pdf Graphs" question (016278e8-7e60-40d4-9568-d7fa42670c32) is a hidden system question (_sysHidden = true) in the "Outcome Assessment" category. It stores which PDF graphs are associated with an outcome, and is programmatically populated by the graph-selector component — users never interact with it directly.

Only 2 of 323 studies have a "Pdf Graphs" annotation database-wide (not just this project — no other project has any "Pdf Graphs" annotations). Both are by the same annotator (Joachim Wahl, wahl@amgen.com), both are non-root outcome-level subquestions with Children: [], and both survived the frontend filter because the annotator added notes:

Study	Answer	Notes	Created (UTC)	Session Created (UTC)
"Prolonged diet-induced obesity..."	`[]` (empty array — safe)	`# data extracted from Figures 1D and 1E`	2021-06-07 13:51	2021-02-07 17:10
"High-fructose diet..." (crash)	`null` (crash trigger)	`# extracted from Fig 6a`	2021-10-05 11:44	2021-06-06 09:17

The "Pdf Graphs" question is a runtime-injected system question — it does not appear in any project's AnnotationQuestions array (neither top-level nor as a subquestion). It is injected by the frontend's graph-selector component. The annotator used the notes field to record which figures to extract from but never actually selected any PDF graphs, leaving the answer as [] (which was then serialised to null for the second study).

How the form handles this question:

The form always initialises "Pdf Graphs" with answer: [] as a subquestion under each outcome (since answerArray = true, default value is [])
Before submission, filterEmptyFieldsWithNoChildren() removes answer groups with empty arrays (lines 758-760 of annotation-form.service.ts)
For 321/323 studies, the filter correctly removes the empty "Pdf Graphs" — it never reaches the backend

Why it survived the filter for 2 studies: The empty-field filter was refactored in commit 248dcd47 (tagged v3.21.0, committed 2021-06-05). Cross-referencing with the jxc (Jenkins X cluster) production helmfile:

Date (UTC)	Production Version	Notes
2021-05-10	syrf-web `v0.0.8`	Old filter (`removeEmptyFieldsWithNoChildren`)
2021-06-07 19:57	syrf-web `v3.21.6`	New filter deployed (includes `v3.21.0` commit)

The two "Pdf Graphs" annotations map to these code versions:

Annotation	Created (UTC)	Answer	Code Version
"Prolonged diet-induced obesity..."	2021-06-07 13:51	`[]`	Old code (`v0.0.8`) — created 6 hours before `v3.21.6` deployed
"High-fructose diet..." (crash)	2021-10-05 11:44	`null`	New code (`v3.21.6+`) — created 4 months after deploy

The first annotation ([]) was created under the old removeEmptyFieldsWithNoChildren, which had an operator precedence bug in its removal logic: (hasNoChildren && answer === '') || answer === null — this unconditionally removed null answers but let empty arrays through since [] !== '' and [] !== null.

The second annotation (null) was created under the new filterEmptyFieldsWithNoChildren, which fixed operator precedence but only removes empty answer groups when all three conditions are met: (1) no child questions with answer groups, (2) notes are empty/null/undefined, and (3) answer is empty/null/undefined/empty-array. Both "Pdf Graphs" annotations survived the filter because the annotator (Joachim Wahl) added non-empty notes — "# data extracted from Figures 1D and 1E" on the first study and "# extracted from Fig 6a" on the crash study. Since condition (2) failed, the filter correctly kept both entries. The annotator used the notes field to record which figures to extract from but never actually selected any PDF graphs, leaving the answer as [].

The null vs empty distinction: The frontend sends [] for both cases (the default for answerArray questions). The difference is in backend serialisation — when the C# MongoDB driver deserialises a StringArrayAnnotation with an empty Answer array, it may store null rather than [] depending on how the BSON was written. The = new() property initialiser is overridden during deserialisation when the BSON field is explicitly null.

Why Only This Project?¶

It's not project-specific — any project with an array annotation whose Answer list is null would crash. This project triggered it because one "Pdf Graphs" StringArrayAnnotation was saved with a null answer list. Other projects with similar data patterns would be equally vulnerable.

Database-wide scan confirms this: 5 null StringArrayAnnotation records exist across 3 projects:

Project	Null Annotations	Studies	Annotators
ReLiSyR-MND in vivo review	2	2	2 (different)
Separating surgery from stroke...	2	2	2 (different)
little DOMS	1	1	1

No null IntArrayAnnotation, BoolArrayAnnotation, or DecimalArrayAnnotation records were found — only StringArrayAnnotation. All 3 projects would crash on wide format export if the null annotation is encountered during processing.

Detailed breakdown of all 5 null annotations:

ReLiSyR-MND in vivo review (2 annotations)¶

Question: "What is the drug being tested?" — study-level, root, user-created AnswerArray question.

Study	Annotator	Annotation Created (UTC)	Session Created (UTC)	Notes
The AMPA receptor antagonist NBQX prolongs survival in a transgenic mouse model of ALS	jennagregory488@gmail.com	2017-06-26 21:14	2017-06-26 21:14	(none)
Effect of physical exercise and anabolic steroid treatment on spinal motoneurons and surrounding glia of wild-type and ALS mice	malcolm.macleod@ed.ac.uk	2017-08-25 17:57	2017-08-25 17:57	(none)

Separating surgery from stroke... (2 annotations)¶

Question: "Why have you excluded this study?" — study-level, root, user-created AnswerArray question.

Study	Annotator	Annotation Created (UTC)	Session Created (UTC)	Notes
Magnetic resonance tracking of transplanted stem cells in rat brain and spinal cord	tyrafraser16@gmail.com	2019-02-12 02:55	2019-02-12 02:55	(none)
Compartmentation of acid-base balance in brain during complete ischemia	ash.russell@utas.edu.au	2019-05-01 05:15	2019-05-01 05:15	(none)

little DOMS (1 annotation)¶

Question: "Pdf Graphs" — outcome-level subquestion, hidden system question.

Study	Annotator	Annotation Created (UTC)	Session Created (UTC)	Notes
High-fructose diet during adolescent development increases neuroinflammation and depressive-like behavior without exacerbating outcomes after stroke	wahl@amgen.com (Joachim Wahl)	2021-10-05 11:44	2021-06-06 09:17	`# extracted from Fig 6a`

Key observations across all 5 records:

The first 4 annotations (ReLiSyR-MND + surgery/stroke) use regular user-created AnswerArray questions, not system-hidden ones — the null answer path isn't unique to the graph-selector frontend component.
Annotation timestamps match session timestamps (within seconds) for the first 4, indicating annotations were created in the same submission as the session. The little DOMS annotation was written 4 months after its session was created.
All records are old (2017–2021) — this bug has been latent for 5–9 years but only triggers on wide format export when the specific study is processed.
No investigator records have name fields populated (only Email) — names shown above are inferred from email addresses where possible.

Fix¶

P0 — Add null guards to array annotation types (direct crash fix):

// StringArrayAnnotation, IntArrayAnnotation, BoolArrayAnnotation, DecimalArrayAnnotation
public override string GetAnswer(string arrayDelimiter = ";") => string.Join(arrayDelimiter, Answer ?? []);

P1 — Add ?? string.Empty to BoolAnnotation.GetAnswer() (contract violation fix):

public override string GetAnswer(string arrayDelimiter) => Answer?.ToString() ?? string.Empty;

P2 — Defensive null filter in StudyAnnotationsGroup._answerCache (belt-and-braces):
```
g => string.Join(";", g.Select(a => a.GetAnswer(";") ?? string.Empty))
```

Files to modify: - src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/Annotation.cs - src/libs/project-management/SyRF.ProjectManagement.Core/Services/DataExportServices/AnnotationUnitGroups/StudyAnnotationsGroup.cs

Bug 2: Annotation Destruction via Shared-Question Tree-Shaking¶

Annotation Ownership Model (Critical Context)¶

Annotations do not belong to stages. They are shared across all stages for a given (reviewer, study, project) tuple. The StageId field on an annotation is provenance metadata only — it records which stage the reviewer was in when the annotation was created. It does not indicate ownership or scoping. The system expects exactly one annotation per (question, reviewer, study, project), regardless of how many stages contain that question.

There are plans to change this model in the future, but this is the current architecture.

TL;DR¶

When a user re-submits Stage 1 (Data Extraction) annotations — via manual save, auto-save, or completion — the backend's tree-shaking logic removes all of the user's annotations matching Stage 1's question IDs and replaces them with the submitted values. Because annotations are shared across stages (not scoped per stage), and 17 of 18 Stage 2 (RoB) questions are also assigned to Stage 1, this overwrites the Stage 2 answers. The tree-shaking is working as designed — study-level shared questions (Items 1-5, 9, 10) round-tripped perfectly. Only per-outcome sub-questions (Items 6, 7, 8) diverged or were lost — these are children of outcome assessment procedure instances, linked to their parents via AA (answer-answer) relationships. The likely cause is that re-saving Stage 1 breaks these AA links when parent procedural annotations are re-created with new IDs, orphaning the child annotations. The Stage 2 session remains marked "completed" but its per-outcome annotation values have been replaced or lost.

Symptoms¶

Outcome-level annotations "disappear" from Stage 2 (Risk of Bias)
Annotations are not deleted — they're overwritten with Stage 1 values (stageId provenance changes to Stage 1)
Affects studies where Stage 2 was completed BEFORE the user returned to edit/re-save Stage 1
Stage 2 sessions show as Status: 1 (completed) but have zero matching annotations

Previous Theory: Frontend Race Condition¶

The initial investigation hypothesised a frontend race condition where switching between stages caused a stale stageId to be sent in the request body. This theory is wrong. Evidence:

Sessions have correct stageIds. All 6 affected studies have Stage 2 sessions with the correct Stage 2 stageId (BHKTi2kyJ0y2MJsA0kfdcA==). If the body had sent the wrong stageId, the session would also be wrong — session creation at ExtractionInfo.cs:103 uses the body's stageId.
Annotations were rewritten days/weeks after the Stage 2 sessions were created, ruling out a race condition during stage switching. (See full study table in MongoDB Evidence below.)
Each study's annotations have sub-millisecond time spread (e.g., all 268 annotations for "Contralesional angiotensin" written at 2026-02-25T16:11:39.469Z–.470Z), consistent with a single session submission, not a race condition.
Human-paced gaps between studies (20–80 min) confirm the user was methodically re-saving Stage 1 for each study, not rapidly switching stages.

Could This Be Users Manually Constructing URLs?¶

No. The corruption is fully explained by normal UI usage — the user returns to Stage 1 (Data Extraction) to review or edit their work, and any save (including the 200ms-debounced auto-save on form change) triggers the destructive tree-shaking. No URL manipulation is needed.

Root Cause: Shared Questions + Tree-Shaking¶

The Question Overlap Problem¶

This project has 17 of 18 Stage 2 questions also assigned to Stage 1:

Stage	Total Questions	Exclusive	Shared
Stage 1 (Data Extraction)	128	111	17
Stage 2 (Risk of Bias)	18	1	17

The 17 shared questions include RoB items ("Item 1 – Selection bias", "Item 5 – Performance bias", etc.) and outcome-related questions. Only "Item 2 – Selection bias: Baseline characteristics" is exclusive to Stage 2.

Annotation Question Tree Structure¶

The full tree, derived from annotation root/non-root relationships in the database:

ROOT (study-level)
├── [Metadata] English study, Is the study excluded?*, Conflict of interest,
│   Has a protocol been published?, How many animals were used...
│
├── [Stage 1 procedure labels — each has many children]
│   ├── Cohort label?
│   │   └── [children] Species, Strain, Animal Sex, Weight [g] (upper/lower),
│   │       Age of animals, Source of animals, Type/Was comorbidity confirmed,
│   │       How is diabetes mellitus induced?, Reported excluded animals...
│   │
│   ├── Experiment label?
│   │   └── [children] Disease Models, Cohorts, Treatments (with further nesting)
│   │       └── [within Treatments] Treatment name, Dose, Route, Intervention type,
│   │                               Number of interventions, Co-treatment...
│   │
│   ├── Outcome assessment procedure label?
│   │   └── [children] What is the name of the neurobehavioral outcome?,
│   │       Outcome measure, Outcome measure units, Type, average/error type,
│   │       Greater is worse, Reference group, Function addressed...
│   │       ├── Item 6 – Detection bias: Random outcome assessment  <- per outcome (shared)
│   │       ├── Item 7 – Detection bias: Blinding                  <- per outcome (shared)
│   │       └── Item 8 – Attrition bias: Incomplete outcome data    <- per outcome (shared)
│   │
│   └── Disease model induction procedure label?
│       └── [children] Model used to induce diabetes?, Type of diabetes modelled?,
│           Was timing of disease induction random?...
│
├── [Stage 1 reporting questions]
│   Reporting of excluded animals?, Were investigators blinded?,
│   Were the animals randomly housed?, Animal housing pre/post-stroke?,
│   Prespecification of in-/exclusion criteria?, etc.
│
└── [RoB items — study-level, shared with Stage 2]
    ├── Item 1 – Selection bias: Sequence generation
    ├── Item 2 – Selection bias: Baseline characteristics (two variants) <- Stage 2 exclusive
    ├── Item 3 – Selection bias: Allocation concealment
    ├── Item 4 – Performance bias: Random housing
    ├── Item 5 – Performance bias: Blinding
    ├── Item 9 – Reporting bias: Selective outcome reporting
    ├── Item 10 - Other: Other sources of bias
    └── Is this study reconciled

* "Is the study excluded?" root annotation has 1 non-root child (the exclusion reason).

Items 6, 7, and 8 appear as non-root children under each outcome — meaning they are outcome-level per-instance shared questions, while Items 1–5, 9, 10 are study-level shared questions.

The Destructive Tree-Shaking¶

File: src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/ExtractionInfo.cs (lines 76-83)

When a session is submitted, AddAnnotations removes existing annotations before adding new ones:

// Tree Shaking — removes all annotations by this user matching ANY of the stage's question IDs
var updatedAnnotations = Annotations.Where(an => !(
    an.ProjectId == projectId &&
    (reconciliation || an.AnnotatorId == annotatorId) &&
    an.Reconciled == reconciliation &&
    stageQuestionIds.Contains(an.QuestionId) || newAnnotationIds.Contains(an.Id))
).ToList();

The stageQuestionIds come from the URL's stage (correct). The filter matches on QuestionId alone — it does not check an.StageId. This is consistent with the annotation ownership model: annotations are shared across stages, so the tree-shaking correctly removes the user's single annotation for each question, regardless of which stage created it.

The problem is not that tree-shaking crosses stage boundaries — that's by design. The tree-shaking correctly removes and replaces the user's annotations for all of the stage's question IDs. Study-level shared questions (Items 1-5, 9, 10) round-tripped perfectly through this process — proving the form loads them correctly regardless of stageId provenance. Only per-outcome sub-questions (Items 6, 7, 8) diverged, and these are children of outcome assessment procedure instances linked via AA relationships. The problem is upstream — in how the form reconstructs procedural sub-questions when the parent instance annotations may have changed IDs during re-submission. See "The Real Problem" below.

The Sequence of Events¶

Investigator completes Stage 1 (Data Extraction) — 128 questions answered, annotations saved (stageId provenance = Stage 1)
Investigator completes Stage 2 (RoB) — 18 questions answered. For the 17 shared questions, the Stage 2 answers overwrite the Stage 1 answers (since annotations are shared, there is only one annotation per question per user). The stageId provenance on those 17 annotations changes from Stage 1 to Stage 2
Investigator returns to Stage 1 (to review, edit, or simply opens the form). The form loads existing annotations for all 128 questions — including the 17 shared questions whose current values reflect the Stage 2 RoB answers (since the form always populates from existing candidate annotations for the reviewer on this study, regardless of which stage created them)
A save occurs (manual or auto-save on any form change). The submission includes all 128 questions' answers
Tree-shaking removes the user's annotations matching Stage 1's 128 question IDs — including the 17 shared-question annotations whose values were set during Stage 2. This is correct behaviour under the shared-annotation model (one annotation per question, last-write-wins)
New Stage 1 annotations are written, all with Stage 1 stageId provenance. Study-level shared questions (Items 1-5, 9, 10) round-trip correctly — their values are identical. But per-outcome sub-questions (Items 6, 7, 8) diverge — likely because the parent procedural instance annotations were re-created with new IDs, breaking the AA relationships that link Items 6/⅞ to their parent outcome instances. The orphaned children are either populated with defaults or lost entirely
Result: Stage 2 session still shows Status: Completed but its 17 shared-question answers have been replaced by Stage 1 values. The 1 exclusive Stage 2 question ("Item 2 – Baseline characteristics") was not affected by tree-shaking, but it was apparently never saved either (0 annotations found for that question by this investigator)

MongoDB Evidence¶

Althea's 6 Affected Studies¶

All annotations currently have Stage 1 stageId provenance. Every study has a single annotation timestamp — all annotations were written in one atomic submission per study, 6–14 days after Stage 2 was completed.

Study (abbreviated)	S1 Session Created	S2 Session Created	Annotations Rewritten	Ann Count
Contralesional angiotensin	2026-02-05 12:04 completed	2026-02-18 18:17 completed	2026-02-25 16:11	55 root + 213 non-root = 268
NLRP3 Inflammasome	2026-02-03 16:50 completed	2026-02-18 19:11 completed	2026-02-25 17:31	47 root + 97 non-root = 144
Low-dose nifedipine	2026-02-03 16:55 completed	2026-02-18 19:20 completed	2026-02-25 17:51	52 root + 134 non-root = 186
Huang-Lian-Jie-Du	2026-02-05 15:57 completed	2026-02-19 12:01 completed	2026-03-05 10:29	48 root + 97 non-root = 145
New alternative approaches	2026-02-09 12:46 completed	2026-02-18 19:01 completed	2026-03-05 10:42	56 root + 225 non-root = 281
DPP-4 Linagliptin	2026-02-09 15:53 completed	2026-02-18 19:31 completed	2026-03-05 10:59	52 root + 193 non-root = 245

Example breakdown for "Contralesional angiotensin" (268 annotations by F5XW): - 248 annotations use Stage 1-exclusive question IDs (correct) - 20 annotations use shared question IDs (should be under Stage 2, but stored under Stage 1) - 0 annotations use the Stage 2-exclusive question ID

Maria's 7 Studies¶

Maria has no Stage 2 sessions anywhere — her data is unaffected by Bug 2. All 7 studies are Stage 1 only, with consistent annotation timestamps matching normal usage patterns.

Study (abbreviated)	S1 Session Created	Annotations Written	Ann Count
ESO-WSO 2020	2026-02-18 14:38 completed	2026-02-18 14:38	31+1 = 32 (excluded study)
Carvedilol	2026-02-19 15:32 in-progress	2026-02-19 15:32	15+1 = 16 (partial)
Huang-Lian-Jie-Du	2026-02-18 15:19 completed	2026-02-24 12:19	46+88 = 134
FGF21	2026-02-24 12:35 completed	2026-02-24 12:36	39+1 = 40 (excluded study)
Lauric acid	2026-02-24 12:50 completed	2026-02-24 12:50	39+1 = 40 (excluded study)
Contralesional angiotensin	2026-02-24 14:08 completed	2026-02-25 14:55	45+88 = 133
New alternative approaches	2026-02-26 13:15 completed	2026-02-26 14:38	56+216 = 272

Maria's annotation for #2339 is likely based on observing the corruption happening to the project (Althea's data) rather than experiencing it herself. She may have attempted a Stage 2 session that failed to persist, or she was reporting on behalf of the project.

Why Only This Project?¶

It requires shared questions between stages, which is a project configuration choice. This project assigned 17 RoB questions to both stages. Any project with question overlap between stages is equally vulnerable — re-saving the earlier stage will destroy the later stage's answers for shared questions.

Fix¶

Important: Adding a StageId check to the tree-shaking filter would be incorrect. Annotations are shared across stages — StageId is provenance metadata, not ownership. Filtering by StageId would create duplicate annotations for the same question by the same user, violating the current data model. The tree-shaking itself is working as designed.

The tree-shaking itself is working as designed. The real problem is upstream of tree-shaking: why does the Stage 1 submission contain different values for the shared questions than what Stage 2 had set?

The Real Problem: Broken AA Relationships for Per-Outcome Sub-Questions¶

The Atlas backup comparison reveals a critical pattern in which annotations diverged:

Question type	Location in tree	Round-trip result
Study-level shared (Items 1-5, 9, 10)	Root annotations	All identical — perfect round-trip
Per-outcome shared (Items 6, 7, 8)	Children of outcome assessment procedure instances	8 changed, 3 lost

Study-level root questions round-tripped perfectly. Only sub-questions nested inside procedural unit instances diverged. This is not a stageId filtering problem — the 17 shared questions ARE in Stage 1's question list, so the selector would include them. The issue is about procedural tree context and AA (answer-answer) relationships.

How Procedural Sub-Questions Work¶

Items 6, 7, 8 are sub-questions of outcome assessment procedures in the annotation tree:

├── Outcome assessment procedure label?
│   └── [children] ...
│       ├── Item 6 – Detection bias: Random outcome assessment  ← per-outcome instance
│       ├── Item 7 – Detection bias: Blinding                  ← per-outcome instance
│       └── Item 8 – Attrition bias: Incomplete outcome data    ← per-outcome instance

Each outcome assessment procedure instance gets its own copy of Items 6, 7, 8. These child annotations are linked to their parent procedural instance via AA (answer-answer) relationships. The annotation is identified not just by its QuestionId but by its position in the tree — which parent instance it belongs to.

Why Per-Outcome Questions Break Across Stages¶

Stage 2 (RoB) has only 18 questions — it does not include the outcome assessment procedure label or the full procedural tree. Those are Stage 1-exclusive questions. When Stage 2 presents Items 6, 7, 8, it must reference the procedural instances that Stage 1 created.

When Stage 1's form is re-opened and the procedural tree is reconstructed:

The form rebuilds the procedural tree from Stage 1's 128 questions
For each outcome assessment instance, it needs to find and populate the corresponding Items 6, 7, 8 child annotations
These child annotations have AA relationships linking them to their parent procedural instance
If the parent procedural instance annotations are re-created with new IDs during the Stage 1 re-save, the AA links to the existing Items 6, 7, 8 annotations break — the children become orphans that can't be placed in the form
The form either populates those sub-questions with defaults (explaining "High" → "Unclear") or doesn't create them at all (explaining the 3 lost annotations)

Neither stage has full visibility of the shared questions' context: Stage 1 can see the full procedural tree but not Stage 2's RoB-specific presentation. Stage 2 can see Items 6, 7, 8 but not the procedural tree they're nested in. The form's validation cannot catch the AA relationship breakage because no single stage view shows the complete picture.

This explains why study-level shared questions were unaffected: Items 1-5, 9, 10 are root annotations with no parent dependency. Their identity is just (questionId, reviewer, study) — no AA relationship to break. They load and round-trip cleanly regardless of which stage created them.

Investigation needed¶

Verify AA relationship breakage: Compare the parent annotation IDs in the Atlas backup's Items 6/⅞ annotations against the current parent annotation IDs in production. If they differ, the AA link was broken by the Stage 1 re-submission.
Trace form initialization for sub-questions: In annotation-form.service.ts, examine initSubQuestions() to understand how it maps saved child annotations to parent procedural instances. Does it use the AA relationship (parent annotation ID) to place children, or does it use position in the question tree?
Determine if procedural instance re-creation generates new annotation IDs: When Stage 1's form re-saves, does the tree-shaking + re-insertion create new annotation GUIDs for the parent procedural instances? If so, any child annotations referencing the old parent IDs would be orphaned.

Possible fix approaches (require further investigation)¶

Preserve parent annotation IDs during re-save — if the tree-shaking removes and re-inserts procedural instance annotations, ensure the same annotation IDs are reused so AA relationships to children are preserved. This is the most targeted fix if AA ID breakage is confirmed.
Re-link orphaned children during form loading — when loading the form, if a child annotation's parent ID doesn't match any current parent instance, attempt to re-link it by matching on question position in the tree rather than by parent annotation ID.
Exclude procedural sub-questions from tree-shaking when their parent is not in the submission — if Items 6, 7, 8 were created under a different stage's procedural context, don't remove them during tree-shaking unless the submission explicitly includes replacement values for those specific instances.
Frontend: don't re-submit sub-question annotations that haven't been modified — track dirty state per annotation and only include changed annotations in the submission payload.

Files likely involved (depends on chosen approach):

src/services/web/src/app/core/services/annotation-form/annotation-form.service.ts — initSubQuestions(), form tree reconstruction
src/services/web/src/app/core/state/entities/annotation/annotation.selectors.ts — annotation loading selectors
src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/ExtractionInfo.cs — tree-shaking logic
src/services/api/SyRF.API.Endpoint/Controllers/ReviewController.cs — session submission endpoint

Data Repair¶

What the Current Database State Tells Us¶

The most significant finding is that every one of Althea's 6 affected studies has a single annotation timestamp — meaning ALL annotations (not just the 17–18 shared questions) were completely overwritten in a single Stage 1 submission per study, roughly 6–14 days after Stage 2 was completed.

This points to Althea manually going back into Stage 1 and re-submitting everything — not an auto-save. She did 3 studies in one session on 25 Feb (over ~1.5 hours), then the other 3 on 5 Mar (over ~30 minutes). She was clearly working through the list deliberately — this is how she responded to the missing annotations she reported on 19 Feb.

Maria's data is unaffected — she only ever did Stage 1, her 7 studies all have consistent annotation timestamps, and her data looks structurally complete for the studies she finished.

Atlas Backup Comparison (21 Feb 2026 snapshot vs current prod)¶

Method: Compared Althea's annotations created while in Stage 2 (110 total across 6 studies) from the 21 Feb Atlas backup against the same question IDs in current production. The backup predates both re-submission events (25 Feb, 5 Mar).

Key result: Current prod has 0 annotations with Stage 2 provenance — complete destruction confirmed. All shared-question annotations now carry Stage 1 provenance from Althea's re-submissions.

Note on StageId: The StageId field on annotations is provenance metadata only — it records which stage the reviewer was in when the annotation was created. Annotations do not belong to stages. They are shared across all stages for a given (reviewer, study, project) tuple, with the system expecting exactly one annotation per (question, reviewer, study, project). Sessions belong to stages; annotations do not.

Changed RoB Judgments (8 answers differ)¶

Study	RoB Item	Backup Answer	Current Answer	Impact
New alt. approaches	Item 6-SL (Random outcome assessment)	High	Unclear	Specific judgment lost
New alt. approaches	Item 7-SL (Blinding detection)	Low	Unclear	Specific judgment lost
New alt. approaches	Item 6-PO ×2 (per outcome)	High	Unclear	Both outcomes affected
New alt. approaches	Item 7-PO ×2 (per outcome)	Low	Unclear	Both outcomes affected
NLRP3	Item 8-SL (Attrition)	Low	High	Risk rating flipped
Huang-Lian-Jie-Du	Item 8-PO (Attrition)	Low	High	Notes contradict: "YES"→"NO"

Lost Annotations (3 annotations exist in backup but not in prod)¶

Study	Items	Backup Answers
Huang-Lian-Jie-Du	Items 6-SL, 7-SL, 8-SL	Unclear, Unclear, High

These 3 annotations were not re-created during Althea's Stage 1 re-submission — they are entirely absent from current prod.

Unchanged (all other RoB judgments match)¶

Items 1–5, 9, 10 (study-level): identical across all 6 studies
Contralesional angiotensin: all Items match perfectly
Low-dose nifedipine: all Items match perfectly
DPP-4 Linagliptin: all Items match perfectly

Interpretation¶

The changes concentrate on Items 6, 7, 8 — exactly the per-outcome items Althea reported as "disappeared" in #2337. Study-level items (1–5, 9, 10) were re-entered identically, suggesting Althea could recall broad study-level judgments but not the granular per-outcome assessments. The "New alternative approaches" study is the most affected — 6 judgments changed from specific (High/Low) to "Unclear", representing genuine data loss even before considering the repair strategy.

Conclusion: The backup must be used for recovery. Althea's re-entered values do NOT fully match her original Stage 2 work — at least 8 RoB judgments changed and 3 annotations were lost entirely.

Affected Studies¶

All in project ac35058c-4767-4f9a-9ddc-5ba9d484c289, investigator F5XWaUIex06j3KycXnpSUA==:

Study	Title (abbreviated)	Annotations Rewritten
1	Contralesional angiotensin type 2 receptor...	25 Feb 16:11
2	NLRP3 Inflammasome: A Potential Target...	25 Feb 17:31
3	Low-dose nifedipine rescues...	25 Feb 17:51
4	Huang-Lian-Jie-Du decoction...	5 Mar 10:29
5	New alternative approaches to stroke treatment...	5 Mar 10:42
6	DPP-4 Inhibitor Linagliptin...	5 Mar 10:59

Repair Strategy¶

For each affected study, identify annotations by investigator F5XW that:
Are stored under Stage 1's stageId AND
Have a QuestionId matching one of the 17 shared questions (or the 1 exclusive Stage 2 question)
Update those annotations' StageId from Stage 1 to Stage 2
Verify Stage 2 sessions now have matching annotations
Re-test wide format export for the project

Complication: Since the user re-saved Stage 1 AFTER completing Stage 2, the shared-question annotations now contain Stage 1's answers, not Stage 2's. The original Stage 2 answers were overwritten and cannot be recovered from the current database state. However, MongoDB Atlas backups can recover the original annotations — see below.

Important: This operates on production data (syrftest database). Script must be reviewed and tested against a copy first.

Atlas Backup Recovery¶

Status: Original Stage 2 annotations are recoverable from Atlas backups.

Atlas Cluster0 (M20, GCP europe-west2) has the following backup policy:

Snapshot Type	Frequency	Retention
Hourly	Every 6h	2 days
Daily	Every 24h	7 days
Weekly	Every Saturday	4 weeks
Monthly	Last day of month	12 months

Available snapshots covering the corruption dates:

Corruption Date	Studies Affected	Best Snapshot	Covers?
25 Feb 2026 (3 studies)	Contralesional, NLRP3, Low-dose nifedipine	21-02-2026 02:03 PM (Weekly, expires ~21 Mar)	Yes
5 Mar 2026 (3 studies)	Huang-Lian-Jie-Du, New alternative, DPP-4	04-03-2026 02:04 PM (Daily, expires ~11 Mar)	Yes

The 21 Feb weekly snapshot predates both corruption events and covers all 6 studies. It expires around 21 Mar 2026 — recovery must happen before then.

Recovery procedure:

Restore the 21 Feb snapshot to a temporary Atlas cluster (or download via Atlas UI)
Connect to the restored syrftest database
For each of the 6 affected studies, extract annotations by investigator F5XWaUIex06j3KycXnpSUA== where StageId = BHKTi2kyJ0y2MJsA0kfdcA== (Stage 2)
In the live database, remove the current annotations for those studies by investigator F5XW that use shared question IDs and have Stage 1's stageId (the overwritten copies)
Insert the original Stage 2 annotations from the backup
Verify Stage 2 sessions now have matching annotations and re-test export
Tear down the temporary cluster

Why this works: The backup contains the original Stage 2 annotations with correct stageIds and the user's actual RoB answers — no re-annotation required.

Summary of Changes Required¶

Bug 1: Export Crash (ready to implement)¶

Priority	Change	Files
P0	Add null guards to array annotation `GetAnswer()` methods	`Annotation.cs`
P1	Add null coalescing to `BoolAnnotation.GetAnswer()`	`Annotation.cs`
P2	Defensive null filter in `StudyAnnotationsGroup._answerCache`	`StudyAnnotationsGroup.cs`

Bug 2: Annotation Destruction (requires investigation)¶

Priority	Change	Status
P0	Investigate AA relationship breakage: compare parent annotation IDs in Atlas backup vs production for Items 6/⅞. Trace `initSubQuestions()` in the form service to understand how children are mapped to parent procedural instances	Root cause hypothesis — per-outcome sub-questions (Items 6, 7, 8) are orphaned when parent procedural annotations are re-created with new IDs during Stage 1 re-save
P1	Restore original Stage 2 annotations from Atlas backup	Data recovery — MongoDB script against prod

Key evidence: Study-level shared questions (Items 1-5, 9, 10) round-tripped perfectly — only per-outcome sub-questions nested inside procedural unit instances diverged. This rules out stageId filtering and points to AA relationship breakage as the root cause.

Rejected fix: Adding a StageId check to the tree-shaking filter was initially proposed but is incorrect — it would create duplicate annotations per question, violating the shared-annotation model where StageId is provenance metadata, not ownership. See "Annotation Ownership Model" section under Bug 2.