Skip to content

Investigation - little DOMS Data Export Failure and Annotation Corruption

Project: little DOMS (ac35058c-4767-4f9a-9ddc-5ba9d484c289) Issues: #2335, #2337, #2339 PR: #2377 Date: 2026-03-07 Status: Root cause identified, fixes proposed


TL;DR

Two separate bugs are affecting this project:

  1. Wide format data export crashes because one StringArrayAnnotation has a null Answer list, which causes string.Join(delimiter, null) to throw ArgumentNullException. Secondary issue: BoolAnnotation.GetAnswer() returns string? (nullable), violating the base class contract. Fix: null guards on all annotation GetAnswer() methods.
  2. Annotations are destroyed when a user re-saves Stage 1 after completing Stage 2. Annotations are shared across all stages (not scoped per stage), so tree-shaking correctly overwrites shared-question answers — this is by design. Study-level shared questions (Items 1-5, 9, 10) round-tripped perfectly. Only per-outcome sub-questions (Items 6, 7, 8) diverged or were lost — these are children of procedural unit instances (outcome assessment procedures) linked via AA (answer-answer) relationships. The likely cause is that re-saving Stage 1 breaks AA relationships when parent procedural annotations are re-created with new IDs, orphaning the child annotations. Fix requires investigating AA relationship preservation during form re-submission.

Both bugs are latent platform-wide issues — this project triggered them first due to its multi-stage workflow and one investigator's usage pattern.


Investigator Accounts

  • Althea Tomas: althea.tomas@gmail.com — CSUUID F5XWaUIex06j3KycXnpSUA==
  • Maria Arroyo Araujo: maria.arroyo@charite.de — CSUUID xr22lv0TQESE9BNC69axeA==

Project Configuration

Created: 2021-02-02

Stages

# Name Extraction Questions
1 Data extraction true 128
2 Risk of bias assessment false 18 (17 shared with Stage 1)
  • Stage 1 ID: 4jYrw5Vwq0+u1chtjeKHLA==
  • Stage 2 ID: BHKTi2kyJ0y2MJsA0kfdcA==

User Reports

#2335 — Althea Tomas: Data export error (12 Feb 2026)

TL;DR: Wide format annotation export fails with ArgumentNullException after processing 56 of 323 studies.

  • Reporter: althea.tomas@gmail.com
  • Action: Export annotations in wide format from "little DOMS" project
  • Result: Export fails with error: Critical export failure: Value cannot be null. (Parameter 'values')
  • Scope: Affects all project members attempting the same export
  • Stack trace root: WideFormatOptions.cs:line 76StudyAnnotationsGroupString.Join receives null

#2337 — Althea Tomas: Annotations not showing (19 Feb 2026)

TL;DR: Completed outcome-level annotations have vanished from the UI for multiple studies.

  • Reporter: Althea Mara Balota Tomas
  • Action: Returned to view previously completed annotations
  • Result: Outcome assessment level inputs have disappeared across all annotated studies
  • Affected study IDs:
  • 2134c6d1-40a0-437e-bd4e-5a19fe303f62
  • b524fe36-d823-4b6f-96ff-cef2b7e4433f
  • f8288d5d-4c2c-4e67-86d6-3415b6bc7f6d
  • d95eccdf-da2f-4704-a124-2b0faaad283e
  • dbf0c84c-7db7-4dba-8377-f7bfa9419111

#2339 — Maria Arroyo Araujo: Export error & annotations corrupted (20 Feb 2026)

TL;DR: Same two problems — export crashes in wide format AND annotations get corrupted when finalising Risk of Bias stage.

  • Reporter: Arroyo Araujo, Maria
  • Action: (1) Export study-level annotations in wide format; (2) Finalise RoB assessment stage
  • Result: (1) Same ArgumentNullException crash; (2) Completing RoB stage corrupts outcome-level annotations in the Data Extraction stage
  • Additional affected study: 293f69be-f9d2-4955-b70d-e337b0e10873
  • Key observation: Corruption happens specifically when finalising annotations in the "Risk of bias assessment" stage — the already-completed annotations in the "Data extraction" stage lose their outcome-level data

Chronology

2021-02-02  Project "little DOMS" created

2026-02-03  Althea starts Stage 1 (Data Extraction) on NLRP3 and Low-dose nifedipine
2026-02-05  Althea starts Stage 1 on Contralesional angiotensin and Huang-Lian-Jie-Du
2026-02-09  Althea starts Stage 1 on New alternative approaches and DPP-4 Linagliptin
            -> At this point Althea has ~144-281 annotations per study saved under Stage 1

2026-02-12  * Althea files #2335 — wide format export crashes with ArgumentNullException
            (null StringArrayAnnotation — separate bug, latent since ~2021)

2026-02-18  Althea completes Stage 2 (RoB) for 5 studies within the same evening:
              18:17 — Contralesional angiotensin Stage 2 session created (completed)
              19:01 — New alternative approaches Stage 2 session created (completed)
              19:11 — NLRP3 Stage 2 session created (completed)
              19:20 — Low-dose nifedipine Stage 2 session created (completed)
              19:31 — DPP-4 Linagliptin Stage 2 session created (completed)
            -> Stage 2 submission tree-shakes Stage 1's 17 shared-question annotations
               and replaces them with Stage 2 RoB answers

            Maria starts Stage 1:
              14:38 — ESO-WSO 2020 (completed, excluded study, 32 annotations)
              15:19 — Huang-Lian-Jie-Du session created (annotations written later)

2026-02-19  Althea completes Stage 2 for Huang-Lian-Jie-Du (12:01)
            * Althea files #2337 — "Outcome assessment level inputs have disappeared"
              -> She completed Stage 2 yesterday/today and the outcome-level annotations
                 are now missing when she views Stage 2

            Maria starts Stage 1 on Carvedilol (15:32) — still in-progress today

2026-02-20  * Maria files #2339 — "same export crash + completing RoB corrupts outcome
              annotations in Data Extraction"
              -> Maria has no Stage 2 sessions: she is either reporting what she observed
                 happening to the project, or tried Stage 2 but it didn't save a session

2026-02-24  Maria annotates three more Stage 1 studies:
              12:19 — Huang-Lian-Jie-Du annotations written (session from 18 Feb)
              12:35 — FGF21 session created + 40 annotations (excluded)
              12:50 — Lauric acid session + 40 annotations (excluded)
              14:08 — Contralesional angiotensin session created

2026-02-25  Maria writes annotations for Contralesional angiotensin at 14:55 (133 total)

            ** Althea re-saves Stage 1 for 3 studies — ALL annotations completely
               rewritten in single submissions (Stage 1 form, all 128 questions):
              16:11 — Contralesional angiotensin: 268 annotations rewritten
              17:31 — NLRP3: 144 annotations rewritten
              17:51 — Low-dose nifedipine: 186 annotations rewritten

2026-02-26  Maria writes New alternative approaches annotations (272 total)

2026-03-05  ** Althea re-saves Stage 1 for remaining 3 studies:
              10:29 — Huang-Lian-Jie-Du: 145 annotations rewritten
              10:42 — New alternative approaches: 281 annotations rewritten
              10:59 — DPP-4 Linagliptin: 245 annotations rewritten

Bug 1: Wide Format Export Crash

TL;DR

One StringArrayAnnotation has a null Answer list. When the export calls string.Join(";", null) inside that annotation's GetAnswer() method, it throws ArgumentNullException. Secondary issue: BoolAnnotation.GetAnswer() returns string?, violating the base class string return type contract.

Symptoms

  • Wide format export fails after processing ~56 studies
  • Error: ArgumentNullException: Value cannot be null. (Parameter 'values')
  • Only affects wide format (long format and bibliographic exports work)

Stack Trace

at System.String.Join(String separator, IEnumerable`1 values)        ← CRASH: StringArrayAnnotation.GetAnswer() calls Join on null list
at System.Linq.Enumerable.SelectIListIterator`2.MoveNext()           ← iterating annotations in group
at System.String.Join(String separator, IEnumerable`1 values)        ← outer: answer cache builder
at System.Linq.Enumerable.ToDictionary(...)                          ← StudyAnnotationsGroup._answerCache
at WideFormatOptions.<>c__DisplayClass11_0.b__4(IGrouping grouping)  ← WideFormatOptions.cs:76

Root Cause

Direct crash: One StringArrayAnnotation for question "Pdf Graphs" on study "High-fructose diet during adolescent development..." has a null Answer list in MongoDB. When the export processes this study, StringArrayAnnotation.GetAnswer() calls string.Join(arrayDelimiter, Answer) where Answer is null, throwing ArgumentNullException("values").

File: src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/Annotation.cs

Array annotation types have List<T> Answer { get; private set; } = new(); — the = new() initialiser is overridden by MongoDB deserialization when the BSON field is null:

// All array types — no null guard on Answer
public override string GetAnswer(string arrayDelimiter = ";") => string.Join(arrayDelimiter, Answer);
// If Answer is null (from MongoDB) → ArgumentNullException("values")

Secondary issue: BoolAnnotation.GetAnswer() (line 73) returns string? (nullable), unlike all other scalar annotation types:

// BoolAnnotation — CAN return null if Answer (bool?) is null
public override string? GetAnswer(string arrayDelimiter) => Answer?.ToString();

// Compare: DecimalAnnotation — correct pattern with null coalescing
public override string GetAnswer(string arrayDelimiter = ";") => Answer?.ToString() ?? string.Empty;

In this project, all 23,697 BoolAnnotations have non-null Answer values (false: 16,205, true: 7,492), so this doesn't trigger here. However, it's a contract violation (GetAnswer base class declares string, not string?) and could cause NullReferenceException in other code paths or projects where bool? Answer is null.

Crash site: StudyAnnotationsGroup._answerCache materialises all annotations:

File: src/libs/project-management/SyRF.ProjectManagement.Core/Services/DataExportServices/AnnotationUnitGroups/StudyAnnotationsGroup.cs (lines 10-12)

private readonly IReadOnlyDictionary<Guid, string> _answerCache = GroupAnnotations
    .GroupBy(a => a.QuestionId)
    .ToDictionary(g => g.Key, g => string.Join(";", g.Select(a => a.GetAnswer(";"))));

MongoDB Evidence

Annotation data across all 323 studies in this project:

Type Total Null Answer Notes
StringAnnotation 64,254 19 Null-coalesced to "" in GetAnswer() — safe
BoolAnnotation 23,697 0 All have values (false: 16,205, true: 7,492)
IntAnnotation 13,661 0 All have values
StringArrayAnnotation 9,246 1 Crash trigger — null list passed to String.Join
DecimalAnnotation 8,278 0 All have values

The single null StringArrayAnnotation is the direct and only crash trigger in this project.

How Did the Null Annotation Get Created?

The "Pdf Graphs" question (016278e8-7e60-40d4-9568-d7fa42670c32) is a hidden system question (_sysHidden = true) in the "Outcome Assessment" category. It stores which PDF graphs are associated with an outcome, and is programmatically populated by the graph-selector component — users never interact with it directly.

Only 2 of 323 studies have a "Pdf Graphs" annotation database-wide (not just this project — no other project has any "Pdf Graphs" annotations). Both are by the same annotator (Joachim Wahl, wahl@amgen.com), both are non-root outcome-level subquestions with Children: [], and both survived the frontend filter because the annotator added notes:

Study Answer Notes Created (UTC) Session Created (UTC)
"Prolonged diet-induced obesity..." [] (empty array — safe) # data extracted from Figures 1D and 1E 2021-06-07 13:51 2021-02-07 17:10
"High-fructose diet..." (crash) null (crash trigger) # extracted from Fig 6a 2021-10-05 11:44 2021-06-06 09:17

The "Pdf Graphs" question is a runtime-injected system question — it does not appear in any project's AnnotationQuestions array (neither top-level nor as a subquestion). It is injected by the frontend's graph-selector component. The annotator used the notes field to record which figures to extract from but never actually selected any PDF graphs, leaving the answer as [] (which was then serialised to null for the second study).

How the form handles this question:

  1. The form always initialises "Pdf Graphs" with answer: [] as a subquestion under each outcome (since answerArray = true, default value is [])
  2. Before submission, filterEmptyFieldsWithNoChildren() removes answer groups with empty arrays (lines 758-760 of annotation-form.service.ts)
  3. For 321/323 studies, the filter correctly removes the empty "Pdf Graphs" — it never reaches the backend

Why it survived the filter for 2 studies: The empty-field filter was refactored in commit 248dcd47 (tagged v3.21.0, committed 2021-06-05). Cross-referencing with the jxc (Jenkins X cluster) production helmfile:

Date (UTC) Production Version Notes
2021-05-10 syrf-web v0.0.8 Old filter (removeEmptyFieldsWithNoChildren)
2021-06-07 19:57 syrf-web v3.21.6 New filter deployed (includes v3.21.0 commit)

The two "Pdf Graphs" annotations map to these code versions:

Annotation Created (UTC) Answer Code Version
"Prolonged diet-induced obesity..." 2021-06-07 13:51 [] Old code (v0.0.8) — created 6 hours before v3.21.6 deployed
"High-fructose diet..." (crash) 2021-10-05 11:44 null New code (v3.21.6+) — created 4 months after deploy

The first annotation ([]) was created under the old removeEmptyFieldsWithNoChildren, which had an operator precedence bug in its removal logic: (hasNoChildren && answer === '') || answer === null — this unconditionally removed null answers but let empty arrays through since [] !== '' and [] !== null.

The second annotation (null) was created under the new filterEmptyFieldsWithNoChildren, which fixed operator precedence but only removes empty answer groups when all three conditions are met: (1) no child questions with answer groups, (2) notes are empty/null/undefined, and (3) answer is empty/null/undefined/empty-array. Both "Pdf Graphs" annotations survived the filter because the annotator (Joachim Wahl) added non-empty notes"# data extracted from Figures 1D and 1E" on the first study and "# extracted from Fig 6a" on the crash study. Since condition (2) failed, the filter correctly kept both entries. The annotator used the notes field to record which figures to extract from but never actually selected any PDF graphs, leaving the answer as [].

The null vs empty distinction: The frontend sends [] for both cases (the default for answerArray questions). The difference is in backend serialisation — when the C# MongoDB driver deserialises a StringArrayAnnotation with an empty Answer array, it may store null rather than [] depending on how the BSON was written. The = new() property initialiser is overridden during deserialisation when the BSON field is explicitly null.

Why Only This Project?

It's not project-specific — any project with an array annotation whose Answer list is null would crash. This project triggered it because one "Pdf Graphs" StringArrayAnnotation was saved with a null answer list. Other projects with similar data patterns would be equally vulnerable.

Database-wide scan confirms this: 5 null StringArrayAnnotation records exist across 3 projects:

Project Null Annotations Studies Annotators
ReLiSyR-MND in vivo review 2 2 2 (different)
Separating surgery from stroke... 2 2 2 (different)
little DOMS 1 1 1

No null IntArrayAnnotation, BoolArrayAnnotation, or DecimalArrayAnnotation records were found — only StringArrayAnnotation. All 3 projects would crash on wide format export if the null annotation is encountered during processing.

Detailed breakdown of all 5 null annotations:

ReLiSyR-MND in vivo review (2 annotations)

Question: "What is the drug being tested?" — study-level, root, user-created AnswerArray question.

Study Annotator Annotation Created (UTC) Session Created (UTC) Notes
The AMPA receptor antagonist NBQX prolongs survival in a transgenic mouse model of ALS jennagregory488@gmail.com 2017-06-26 21:14 2017-06-26 21:14 (none)
Effect of physical exercise and anabolic steroid treatment on spinal motoneurons and surrounding glia of wild-type and ALS mice malcolm.macleod@ed.ac.uk 2017-08-25 17:57 2017-08-25 17:57 (none)

Separating surgery from stroke... (2 annotations)

Question: "Why have you excluded this study?" — study-level, root, user-created AnswerArray question.

Study Annotator Annotation Created (UTC) Session Created (UTC) Notes
Magnetic resonance tracking of transplanted stem cells in rat brain and spinal cord tyrafraser16@gmail.com 2019-02-12 02:55 2019-02-12 02:55 (none)
Compartmentation of acid-base balance in brain during complete ischemia ash.russell@utas.edu.au 2019-05-01 05:15 2019-05-01 05:15 (none)

little DOMS (1 annotation)

Question: "Pdf Graphs" — outcome-level subquestion, hidden system question.

Study Annotator Annotation Created (UTC) Session Created (UTC) Notes
High-fructose diet during adolescent development increases neuroinflammation and depressive-like behavior without exacerbating outcomes after stroke wahl@amgen.com (Joachim Wahl) 2021-10-05 11:44 2021-06-06 09:17 # extracted from Fig 6a

Key observations across all 5 records:

  • The first 4 annotations (ReLiSyR-MND + surgery/stroke) use regular user-created AnswerArray questions, not system-hidden ones — the null answer path isn't unique to the graph-selector frontend component.
  • Annotation timestamps match session timestamps (within seconds) for the first 4, indicating annotations were created in the same submission as the session. The little DOMS annotation was written 4 months after its session was created.
  • All records are old (2017–2021) — this bug has been latent for 5–9 years but only triggers on wide format export when the specific study is processed.
  • No investigator records have name fields populated (only Email) — names shown above are inferred from email addresses where possible.

Fix

  1. P0 — Add null guards to array annotation types (direct crash fix):

    // StringArrayAnnotation, IntArrayAnnotation, BoolArrayAnnotation, DecimalArrayAnnotation
    public override string GetAnswer(string arrayDelimiter = ";") => string.Join(arrayDelimiter, Answer ?? []);
    

  2. P1 — Add ?? string.Empty to BoolAnnotation.GetAnswer() (contract violation fix):

    public override string GetAnswer(string arrayDelimiter) => Answer?.ToString() ?? string.Empty;
    

  3. P2 — Defensive null filter in StudyAnnotationsGroup._answerCache (belt-and-braces):

    g => string.Join(";", g.Select(a => a.GetAnswer(";") ?? string.Empty))
    

Files to modify: - src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/Annotation.cs - src/libs/project-management/SyRF.ProjectManagement.Core/Services/DataExportServices/AnnotationUnitGroups/StudyAnnotationsGroup.cs


Bug 2: Annotation Destruction via Shared-Question Tree-Shaking

Annotation Ownership Model (Critical Context)

Annotations do not belong to stages. They are shared across all stages for a given (reviewer, study, project) tuple. The StageId field on an annotation is provenance metadata only — it records which stage the reviewer was in when the annotation was created. It does not indicate ownership or scoping. The system expects exactly one annotation per (question, reviewer, study, project), regardless of how many stages contain that question.

There are plans to change this model in the future, but this is the current architecture.

TL;DR

When a user re-submits Stage 1 (Data Extraction) annotations — via manual save, auto-save, or completion — the backend's tree-shaking logic removes all of the user's annotations matching Stage 1's question IDs and replaces them with the submitted values. Because annotations are shared across stages (not scoped per stage), and 17 of 18 Stage 2 (RoB) questions are also assigned to Stage 1, this overwrites the Stage 2 answers. The tree-shaking is working as designed — study-level shared questions (Items 1-5, 9, 10) round-tripped perfectly. Only per-outcome sub-questions (Items 6, 7, 8) diverged or were lost — these are children of outcome assessment procedure instances, linked to their parents via AA (answer-answer) relationships. The likely cause is that re-saving Stage 1 breaks these AA links when parent procedural annotations are re-created with new IDs, orphaning the child annotations. The Stage 2 session remains marked "completed" but its per-outcome annotation values have been replaced or lost.

Symptoms

  • Outcome-level annotations "disappear" from Stage 2 (Risk of Bias)
  • Annotations are not deleted — they're overwritten with Stage 1 values (stageId provenance changes to Stage 1)
  • Affects studies where Stage 2 was completed BEFORE the user returned to edit/re-save Stage 1
  • Stage 2 sessions show as Status: 1 (completed) but have zero matching annotations

Previous Theory: Frontend Race Condition

The initial investigation hypothesised a frontend race condition where switching between stages caused a stale stageId to be sent in the request body. This theory is wrong. Evidence:

  1. Sessions have correct stageIds. All 6 affected studies have Stage 2 sessions with the correct Stage 2 stageId (BHKTi2kyJ0y2MJsA0kfdcA==). If the body had sent the wrong stageId, the session would also be wrong — session creation at ExtractionInfo.cs:103 uses the body's stageId.

  2. Annotations were rewritten days/weeks after the Stage 2 sessions were created, ruling out a race condition during stage switching. (See full study table in MongoDB Evidence below.)

  3. Each study's annotations have sub-millisecond time spread (e.g., all 268 annotations for "Contralesional angiotensin" written at 2026-02-25T16:11:39.469Z.470Z), consistent with a single session submission, not a race condition.

  4. Human-paced gaps between studies (20–80 min) confirm the user was methodically re-saving Stage 1 for each study, not rapidly switching stages.

Could This Be Users Manually Constructing URLs?

No. The corruption is fully explained by normal UI usage — the user returns to Stage 1 (Data Extraction) to review or edit their work, and any save (including the 200ms-debounced auto-save on form change) triggers the destructive tree-shaking. No URL manipulation is needed.

Root Cause: Shared Questions + Tree-Shaking

The Question Overlap Problem

This project has 17 of 18 Stage 2 questions also assigned to Stage 1:

Stage Total Questions Exclusive Shared
Stage 1 (Data Extraction) 128 111 17
Stage 2 (Risk of Bias) 18 1 17

The 17 shared questions include RoB items ("Item 1 – Selection bias", "Item 5 – Performance bias", etc.) and outcome-related questions. Only "Item 2 – Selection bias: Baseline characteristics" is exclusive to Stage 2.

Annotation Question Tree Structure

The full tree, derived from annotation root/non-root relationships in the database:

ROOT (study-level)
├── [Metadata] English study, Is the study excluded?*, Conflict of interest,
│   Has a protocol been published?, How many animals were used...
├── [Stage 1 procedure labels — each has many children]
│   ├── Cohort label?
│   │   └── [children] Species, Strain, Animal Sex, Weight [g] (upper/lower),
│   │       Age of animals, Source of animals, Type/Was comorbidity confirmed,
│   │       How is diabetes mellitus induced?, Reported excluded animals...
│   │
│   ├── Experiment label?
│   │   └── [children] Disease Models, Cohorts, Treatments (with further nesting)
│   │       └── [within Treatments] Treatment name, Dose, Route, Intervention type,
│   │                               Number of interventions, Co-treatment...
│   │
│   ├── Outcome assessment procedure label?
│   │   └── [children] What is the name of the neurobehavioral outcome?,
│   │       Outcome measure, Outcome measure units, Type, average/error type,
│   │       Greater is worse, Reference group, Function addressed...
│   │       ├── Item 6 – Detection bias: Random outcome assessment  <- per outcome (shared)
│   │       ├── Item 7 – Detection bias: Blinding                  <- per outcome (shared)
│   │       └── Item 8 – Attrition bias: Incomplete outcome data    <- per outcome (shared)
│   │
│   └── Disease model induction procedure label?
│       └── [children] Model used to induce diabetes?, Type of diabetes modelled?,
│           Was timing of disease induction random?...
├── [Stage 1 reporting questions]
│   Reporting of excluded animals?, Were investigators blinded?,
│   Were the animals randomly housed?, Animal housing pre/post-stroke?,
│   Prespecification of in-/exclusion criteria?, etc.
└── [RoB items — study-level, shared with Stage 2]
    ├── Item 1 – Selection bias: Sequence generation
    ├── Item 2 – Selection bias: Baseline characteristics (two variants) <- Stage 2 exclusive
    ├── Item 3 – Selection bias: Allocation concealment
    ├── Item 4 – Performance bias: Random housing
    ├── Item 5 – Performance bias: Blinding
    ├── Item 9 – Reporting bias: Selective outcome reporting
    ├── Item 10 - Other: Other sources of bias
    └── Is this study reconciled

* "Is the study excluded?" root annotation has 1 non-root child (the exclusion reason).

Items 6, 7, and 8 appear as non-root children under each outcome — meaning they are outcome-level per-instance shared questions, while Items 1–5, 9, 10 are study-level shared questions.

The Destructive Tree-Shaking

File: src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/ExtractionInfo.cs (lines 76-83)

When a session is submitted, AddAnnotations removes existing annotations before adding new ones:

// Tree Shaking — removes all annotations by this user matching ANY of the stage's question IDs
var updatedAnnotations = Annotations.Where(an => !(
    an.ProjectId == projectId &&
    (reconciliation || an.AnnotatorId == annotatorId) &&
    an.Reconciled == reconciliation &&
    stageQuestionIds.Contains(an.QuestionId) || newAnnotationIds.Contains(an.Id))
).ToList();

The stageQuestionIds come from the URL's stage (correct). The filter matches on QuestionId alone — it does not check an.StageId. This is consistent with the annotation ownership model: annotations are shared across stages, so the tree-shaking correctly removes the user's single annotation for each question, regardless of which stage created it.

The problem is not that tree-shaking crosses stage boundaries — that's by design. The tree-shaking correctly removes and replaces the user's annotations for all of the stage's question IDs. Study-level shared questions (Items 1-5, 9, 10) round-tripped perfectly through this process — proving the form loads them correctly regardless of stageId provenance. Only per-outcome sub-questions (Items 6, 7, 8) diverged, and these are children of outcome assessment procedure instances linked via AA relationships. The problem is upstream — in how the form reconstructs procedural sub-questions when the parent instance annotations may have changed IDs during re-submission. See "The Real Problem" below.

The Sequence of Events

  1. Investigator completes Stage 1 (Data Extraction) — 128 questions answered, annotations saved (stageId provenance = Stage 1)
  2. Investigator completes Stage 2 (RoB) — 18 questions answered. For the 17 shared questions, the Stage 2 answers overwrite the Stage 1 answers (since annotations are shared, there is only one annotation per question per user). The stageId provenance on those 17 annotations changes from Stage 1 to Stage 2
  3. Investigator returns to Stage 1 (to review, edit, or simply opens the form). The form loads existing annotations for all 128 questions — including the 17 shared questions whose current values reflect the Stage 2 RoB answers (since the form always populates from existing candidate annotations for the reviewer on this study, regardless of which stage created them)
  4. A save occurs (manual or auto-save on any form change). The submission includes all 128 questions' answers
  5. Tree-shaking removes the user's annotations matching Stage 1's 128 question IDs — including the 17 shared-question annotations whose values were set during Stage 2. This is correct behaviour under the shared-annotation model (one annotation per question, last-write-wins)
  6. New Stage 1 annotations are written, all with Stage 1 stageId provenance. Study-level shared questions (Items 1-5, 9, 10) round-trip correctly — their values are identical. But per-outcome sub-questions (Items 6, 7, 8) diverge — likely because the parent procedural instance annotations were re-created with new IDs, breaking the AA relationships that link Items 6/⅞ to their parent outcome instances. The orphaned children are either populated with defaults or lost entirely
  7. Result: Stage 2 session still shows Status: Completed but its 17 shared-question answers have been replaced by Stage 1 values. The 1 exclusive Stage 2 question ("Item 2 – Baseline characteristics") was not affected by tree-shaking, but it was apparently never saved either (0 annotations found for that question by this investigator)

MongoDB Evidence

Althea's 6 Affected Studies

All annotations currently have Stage 1 stageId provenance. Every study has a single annotation timestamp — all annotations were written in one atomic submission per study, 6–14 days after Stage 2 was completed.

Study (abbreviated) S1 Session Created S2 Session Created Annotations Rewritten Ann Count
Contralesional angiotensin 2026-02-05 12:04 completed 2026-02-18 18:17 completed 2026-02-25 16:11 55 root + 213 non-root = 268
NLRP3 Inflammasome 2026-02-03 16:50 completed 2026-02-18 19:11 completed 2026-02-25 17:31 47 root + 97 non-root = 144
Low-dose nifedipine 2026-02-03 16:55 completed 2026-02-18 19:20 completed 2026-02-25 17:51 52 root + 134 non-root = 186
Huang-Lian-Jie-Du 2026-02-05 15:57 completed 2026-02-19 12:01 completed 2026-03-05 10:29 48 root + 97 non-root = 145
New alternative approaches 2026-02-09 12:46 completed 2026-02-18 19:01 completed 2026-03-05 10:42 56 root + 225 non-root = 281
DPP-4 Linagliptin 2026-02-09 15:53 completed 2026-02-18 19:31 completed 2026-03-05 10:59 52 root + 193 non-root = 245

Example breakdown for "Contralesional angiotensin" (268 annotations by F5XW): - 248 annotations use Stage 1-exclusive question IDs (correct) - 20 annotations use shared question IDs (should be under Stage 2, but stored under Stage 1) - 0 annotations use the Stage 2-exclusive question ID

Maria's 7 Studies

Maria has no Stage 2 sessions anywhere — her data is unaffected by Bug 2. All 7 studies are Stage 1 only, with consistent annotation timestamps matching normal usage patterns.

Study (abbreviated) S1 Session Created Annotations Written Ann Count
ESO-WSO 2020 2026-02-18 14:38 completed 2026-02-18 14:38 31+1 = 32 (excluded study)
Carvedilol 2026-02-19 15:32 in-progress 2026-02-19 15:32 15+1 = 16 (partial)
Huang-Lian-Jie-Du 2026-02-18 15:19 completed 2026-02-24 12:19 46+88 = 134
FGF21 2026-02-24 12:35 completed 2026-02-24 12:36 39+1 = 40 (excluded study)
Lauric acid 2026-02-24 12:50 completed 2026-02-24 12:50 39+1 = 40 (excluded study)
Contralesional angiotensin 2026-02-24 14:08 completed 2026-02-25 14:55 45+88 = 133
New alternative approaches 2026-02-26 13:15 completed 2026-02-26 14:38 56+216 = 272

Maria's annotation for #2339 is likely based on observing the corruption happening to the project (Althea's data) rather than experiencing it herself. She may have attempted a Stage 2 session that failed to persist, or she was reporting on behalf of the project.

Why Only This Project?

It requires shared questions between stages, which is a project configuration choice. This project assigned 17 RoB questions to both stages. Any project with question overlap between stages is equally vulnerable — re-saving the earlier stage will destroy the later stage's answers for shared questions.

Fix

Important: Adding a StageId check to the tree-shaking filter would be incorrect. Annotations are shared across stages — StageId is provenance metadata, not ownership. Filtering by StageId would create duplicate annotations for the same question by the same user, violating the current data model. The tree-shaking itself is working as designed.

The tree-shaking itself is working as designed. The real problem is upstream of tree-shaking: why does the Stage 1 submission contain different values for the shared questions than what Stage 2 had set?

The Real Problem: Broken AA Relationships for Per-Outcome Sub-Questions

The Atlas backup comparison reveals a critical pattern in which annotations diverged:

Question type Location in tree Round-trip result
Study-level shared (Items 1-5, 9, 10) Root annotations All identical — perfect round-trip
Per-outcome shared (Items 6, 7, 8) Children of outcome assessment procedure instances 8 changed, 3 lost

Study-level root questions round-tripped perfectly. Only sub-questions nested inside procedural unit instances diverged. This is not a stageId filtering problem — the 17 shared questions ARE in Stage 1's question list, so the selector would include them. The issue is about procedural tree context and AA (answer-answer) relationships.

How Procedural Sub-Questions Work

Items 6, 7, 8 are sub-questions of outcome assessment procedures in the annotation tree:

├── Outcome assessment procedure label?
│   └── [children] ...
│       ├── Item 6 – Detection bias: Random outcome assessment  ← per-outcome instance
│       ├── Item 7 – Detection bias: Blinding                  ← per-outcome instance
│       └── Item 8 – Attrition bias: Incomplete outcome data    ← per-outcome instance

Each outcome assessment procedure instance gets its own copy of Items 6, 7, 8. These child annotations are linked to their parent procedural instance via AA (answer-answer) relationships. The annotation is identified not just by its QuestionId but by its position in the tree — which parent instance it belongs to.

Why Per-Outcome Questions Break Across Stages

Stage 2 (RoB) has only 18 questions — it does not include the outcome assessment procedure label or the full procedural tree. Those are Stage 1-exclusive questions. When Stage 2 presents Items 6, 7, 8, it must reference the procedural instances that Stage 1 created.

When Stage 1's form is re-opened and the procedural tree is reconstructed:

  1. The form rebuilds the procedural tree from Stage 1's 128 questions
  2. For each outcome assessment instance, it needs to find and populate the corresponding Items 6, 7, 8 child annotations
  3. These child annotations have AA relationships linking them to their parent procedural instance
  4. If the parent procedural instance annotations are re-created with new IDs during the Stage 1 re-save, the AA links to the existing Items 6, 7, 8 annotations break — the children become orphans that can't be placed in the form
  5. The form either populates those sub-questions with defaults (explaining "High" → "Unclear") or doesn't create them at all (explaining the 3 lost annotations)

Neither stage has full visibility of the shared questions' context: Stage 1 can see the full procedural tree but not Stage 2's RoB-specific presentation. Stage 2 can see Items 6, 7, 8 but not the procedural tree they're nested in. The form's validation cannot catch the AA relationship breakage because no single stage view shows the complete picture.

This explains why study-level shared questions were unaffected: Items 1-5, 9, 10 are root annotations with no parent dependency. Their identity is just (questionId, reviewer, study) — no AA relationship to break. They load and round-trip cleanly regardless of which stage created them.

Investigation needed

  1. Verify AA relationship breakage: Compare the parent annotation IDs in the Atlas backup's Items 6/⅞ annotations against the current parent annotation IDs in production. If they differ, the AA link was broken by the Stage 1 re-submission.

  2. Trace form initialization for sub-questions: In annotation-form.service.ts, examine initSubQuestions() to understand how it maps saved child annotations to parent procedural instances. Does it use the AA relationship (parent annotation ID) to place children, or does it use position in the question tree?

  3. Determine if procedural instance re-creation generates new annotation IDs: When Stage 1's form re-saves, does the tree-shaking + re-insertion create new annotation GUIDs for the parent procedural instances? If so, any child annotations referencing the old parent IDs would be orphaned.

Possible fix approaches (require further investigation)

  1. Preserve parent annotation IDs during re-save — if the tree-shaking removes and re-inserts procedural instance annotations, ensure the same annotation IDs are reused so AA relationships to children are preserved. This is the most targeted fix if AA ID breakage is confirmed.

  2. Re-link orphaned children during form loading — when loading the form, if a child annotation's parent ID doesn't match any current parent instance, attempt to re-link it by matching on question position in the tree rather than by parent annotation ID.

  3. Exclude procedural sub-questions from tree-shaking when their parent is not in the submission — if Items 6, 7, 8 were created under a different stage's procedural context, don't remove them during tree-shaking unless the submission explicitly includes replacement values for those specific instances.

  4. Frontend: don't re-submit sub-question annotations that haven't been modified — track dirty state per annotation and only include changed annotations in the submission payload.

Files likely involved (depends on chosen approach):

  • src/services/web/src/app/core/services/annotation-form/annotation-form.service.tsinitSubQuestions(), form tree reconstruction
  • src/services/web/src/app/core/state/entities/annotation/annotation.selectors.ts — annotation loading selectors
  • src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/ExtractionInfo.cs — tree-shaking logic
  • src/services/api/SyRF.API.Endpoint/Controllers/ReviewController.cs — session submission endpoint

Data Repair

What the Current Database State Tells Us

The most significant finding is that every one of Althea's 6 affected studies has a single annotation timestamp — meaning ALL annotations (not just the 17–18 shared questions) were completely overwritten in a single Stage 1 submission per study, roughly 6–14 days after Stage 2 was completed.

This points to Althea manually going back into Stage 1 and re-submitting everything — not an auto-save. She did 3 studies in one session on 25 Feb (over ~1.5 hours), then the other 3 on 5 Mar (over ~30 minutes). She was clearly working through the list deliberately — this is how she responded to the missing annotations she reported on 19 Feb.

Maria's data is unaffected — she only ever did Stage 1, her 7 studies all have consistent annotation timestamps, and her data looks structurally complete for the studies she finished.

Atlas Backup Comparison (21 Feb 2026 snapshot vs current prod)

Method: Compared Althea's annotations created while in Stage 2 (110 total across 6 studies) from the 21 Feb Atlas backup against the same question IDs in current production. The backup predates both re-submission events (25 Feb, 5 Mar).

Key result: Current prod has 0 annotations with Stage 2 provenance — complete destruction confirmed. All shared-question annotations now carry Stage 1 provenance from Althea's re-submissions.

Note on StageId: The StageId field on annotations is provenance metadata only — it records which stage the reviewer was in when the annotation was created. Annotations do not belong to stages. They are shared across all stages for a given (reviewer, study, project) tuple, with the system expecting exactly one annotation per (question, reviewer, study, project). Sessions belong to stages; annotations do not.

Changed RoB Judgments (8 answers differ)

Study RoB Item Backup Answer Current Answer Impact
New alt. approaches Item 6-SL (Random outcome assessment) High Unclear Specific judgment lost
New alt. approaches Item 7-SL (Blinding detection) Low Unclear Specific judgment lost
New alt. approaches Item 6-PO ×2 (per outcome) High Unclear Both outcomes affected
New alt. approaches Item 7-PO ×2 (per outcome) Low Unclear Both outcomes affected
NLRP3 Item 8-SL (Attrition) Low High Risk rating flipped
Huang-Lian-Jie-Du Item 8-PO (Attrition) Low High Notes contradict: "YES"→"NO"

Lost Annotations (3 annotations exist in backup but not in prod)

Study Items Backup Answers
Huang-Lian-Jie-Du Items 6-SL, 7-SL, 8-SL Unclear, Unclear, High

These 3 annotations were not re-created during Althea's Stage 1 re-submission — they are entirely absent from current prod.

Unchanged (all other RoB judgments match)

  • Items 1–5, 9, 10 (study-level): identical across all 6 studies
  • Contralesional angiotensin: all Items match perfectly
  • Low-dose nifedipine: all Items match perfectly
  • DPP-4 Linagliptin: all Items match perfectly

Interpretation

The changes concentrate on Items 6, 7, 8 — exactly the per-outcome items Althea reported as "disappeared" in #2337. Study-level items (1–5, 9, 10) were re-entered identically, suggesting Althea could recall broad study-level judgments but not the granular per-outcome assessments. The "New alternative approaches" study is the most affected — 6 judgments changed from specific (High/Low) to "Unclear", representing genuine data loss even before considering the repair strategy.

Conclusion: The backup must be used for recovery. Althea's re-entered values do NOT fully match her original Stage 2 work — at least 8 RoB judgments changed and 3 annotations were lost entirely.

Affected Studies

All in project ac35058c-4767-4f9a-9ddc-5ba9d484c289, investigator F5XWaUIex06j3KycXnpSUA==:

Study Title (abbreviated) Annotations Rewritten
1 Contralesional angiotensin type 2 receptor... 25 Feb 16:11
2 NLRP3 Inflammasome: A Potential Target... 25 Feb 17:31
3 Low-dose nifedipine rescues... 25 Feb 17:51
4 Huang-Lian-Jie-Du decoction... 5 Mar 10:29
5 New alternative approaches to stroke treatment... 5 Mar 10:42
6 DPP-4 Inhibitor Linagliptin... 5 Mar 10:59

Repair Strategy

  1. For each affected study, identify annotations by investigator F5XW that:
  2. Are stored under Stage 1's stageId AND
  3. Have a QuestionId matching one of the 17 shared questions (or the 1 exclusive Stage 2 question)
  4. Update those annotations' StageId from Stage 1 to Stage 2
  5. Verify Stage 2 sessions now have matching annotations
  6. Re-test wide format export for the project

Complication: Since the user re-saved Stage 1 AFTER completing Stage 2, the shared-question annotations now contain Stage 1's answers, not Stage 2's. The original Stage 2 answers were overwritten and cannot be recovered from the current database state. However, MongoDB Atlas backups can recover the original annotations — see below.

Important: This operates on production data (syrftest database). Script must be reviewed and tested against a copy first.

Atlas Backup Recovery

Status: Original Stage 2 annotations are recoverable from Atlas backups.

Atlas Cluster0 (M20, GCP europe-west2) has the following backup policy:

Snapshot Type Frequency Retention
Hourly Every 6h 2 days
Daily Every 24h 7 days
Weekly Every Saturday 4 weeks
Monthly Last day of month 12 months

Available snapshots covering the corruption dates:

Corruption Date Studies Affected Best Snapshot Covers?
25 Feb 2026 (3 studies) Contralesional, NLRP3, Low-dose nifedipine 21-02-2026 02:03 PM (Weekly, expires ~21 Mar) Yes
5 Mar 2026 (3 studies) Huang-Lian-Jie-Du, New alternative, DPP-4 04-03-2026 02:04 PM (Daily, expires ~11 Mar) Yes

The 21 Feb weekly snapshot predates both corruption events and covers all 6 studies. It expires around 21 Mar 2026 — recovery must happen before then.

Recovery procedure:

  1. Restore the 21 Feb snapshot to a temporary Atlas cluster (or download via Atlas UI)
  2. Connect to the restored syrftest database
  3. For each of the 6 affected studies, extract annotations by investigator F5XWaUIex06j3KycXnpSUA== where StageId = BHKTi2kyJ0y2MJsA0kfdcA== (Stage 2)
  4. In the live database, remove the current annotations for those studies by investigator F5XW that use shared question IDs and have Stage 1's stageId (the overwritten copies)
  5. Insert the original Stage 2 annotations from the backup
  6. Verify Stage 2 sessions now have matching annotations and re-test export
  7. Tear down the temporary cluster

Why this works: The backup contains the original Stage 2 annotations with correct stageIds and the user's actual RoB answers — no re-annotation required.


Summary of Changes Required

Bug 1: Export Crash (ready to implement)

Priority Change Files
P0 Add null guards to array annotation GetAnswer() methods Annotation.cs
P1 Add null coalescing to BoolAnnotation.GetAnswer() Annotation.cs
P2 Defensive null filter in StudyAnnotationsGroup._answerCache StudyAnnotationsGroup.cs

Bug 2: Annotation Destruction (requires investigation)

Priority Change Status
P0 Investigate AA relationship breakage: compare parent annotation IDs in Atlas backup vs production for Items 6/⅞. Trace initSubQuestions() in the form service to understand how children are mapped to parent procedural instances Root cause hypothesis — per-outcome sub-questions (Items 6, 7, 8) are orphaned when parent procedural annotations are re-created with new IDs during Stage 1 re-save
P1 Restore original Stage 2 annotations from Atlas backup Data recovery — MongoDB script against prod

Key evidence: Study-level shared questions (Items 1-5, 9, 10) round-tripped perfectly — only per-outcome sub-questions nested inside procedural unit instances diverged. This rules out stageId filtering and points to AA relationship breakage as the root cause.

Rejected fix: Adding a StageId check to the tree-shaking filter was initially proposed but is incorrect — it would create duplicate annotations per question, violating the shared-annotation model where StageId is provenance metadata, not ownership. See "Annotation Ownership Model" section under Bug 2.