Skip to content

Bulk Study Update: Blank Field Behavior Inconsistency

Summary

A discrepancy exists between the documented behavior and actual implementation of the Bulk Study Update feature when handling blank/empty cells for customId and pdfRelativePath fields.

Field Documentation Says Code Actually Does
customId (blank cell) Erases existing value Preserves existing value
pdfRelativePath (blank cell) Erases existing value Preserves existing value
Screening columns (blank cell) Preserves existing value Preserves existing value ✓

Evidence

Documentation (User Guide)

File: /user-guide/manage-studies.md (lines 45, 67-69)

> **Warning:** Providing an empty/blank cell for `pdfRelativePath` or `customId`
> in the update file *will erase* any existing value for that field in SyRF for that study.

> **Screening Decisions:** ... Blank cells in screening columns are *ignored*
> and do not erase existing decisions.

Code Implementation

File: src/libs/project-management/SyRF.ProjectManagement.Core/Services/StudyReferenceFileParser.cs (lines 179-180)

if (!string.IsNullOrWhiteSpace(u.CustomId)) study.CustomId = u.CustomId;
if (!string.IsNullOrWhiteSpace(u.PdfRelativePath)) study.PdfRelativePath = u.PdfRelativePath;

File: src/libs/project-management/SyRF.ProjectManagement.Core/Services/ParserImplementations/NewParser/StudyUpdateRecordProcessor.cs (lines 172-186)

private string? GetValueFromColumn(IReadOnlyList<string> fields, string columnName)
{
    if (!_columnDictionary.TryGetValue(columnName.ToLowerInvariant(), out var columnIndex))
    {
        return null;  // Missing column → null
    }
    var val = fields[columnIndex];
    return string.IsNullOrWhiteSpace(val) ? "" : val;  // Blank field → empty string
}

Data Flow Analysis

CSV State GetValueFromColumn() Returns IsNullOrWhiteSpace() Update Behavior
Column missing entirely null true Preserved
Column present, cell empty "" true Preserved
Column present, cell has value "value" false Updated

Both missing columns AND blank fields result in preservation, contradicting documentation.


E2E Test Plan

Objective

Confirm the actual runtime behavior in a deployed environment to definitively verify the bug before implementing a fix.

Test Scenarios

Scenario 1: Blank CustomId Field

Setup:

  1. Create a project with test studies
  2. Ensure studies have existing customId values (e.g., "ORIGINAL-001", "ORIGINAL-002")

Test:

  1. Export studies to get studyId values
  2. Create CSV with columns: studyId, customId
  3. Leave customId cells blank for test studies
  4. Upload via Bulk Study Update

Expected (per docs): customId values should be erased (null/empty) Expected (per code): customId values should be preserved ("ORIGINAL-001", etc.)

Scenario 2: Blank PdfRelativePath Field

Setup:

  1. Use same project/studies
  2. Ensure studies have existing pdfRelativePath values

Test:

  1. Create CSV with columns: studyId, pdfRelativePath
  2. Leave pdfRelativePath cells blank
  3. Upload via Bulk Study Update

Expected (per docs): pdfRelativePath values should be erased Expected (per code): pdfRelativePath values should be preserved

Scenario 3: Missing Column vs Blank Field

Setup:

  1. Studies with both customId and pdfRelativePath set

Test A - Missing Column:

  1. CSV with only studyId and customId columns (no pdfRelativePath column)
  2. customId has new values
  3. Upload and verify pdfRelativePath is preserved

Test B - Blank Field:

  1. CSV with studyId, customId, pdfRelativePath columns
  2. customId has new values, pdfRelativePath is blank
  3. Upload and verify behavior matches Test A (both preserved)

Scenario 4: Screening Baseline (Control)

Test:

  1. CSV with screening columns containing blank cells
  2. Verify blank screening cells are ignored (preserved)
  3. This confirms screening behavior matches documentation

Test Environment

  • Recommended: Staging environment with syrftest database
  • Alternative: Local development with test database
  • Cleanup: Delete test project after verification

Verification Query

// MongoDB query to verify field values after bulk update
db.pmStudy.find(
  { _id: { $in: [CSUUID("study-id-1"), CSUUID("study-id-2")] } },
  { customId: 1, pdfRelativePath: 1 }
)

Options Analysis

Option A: Fix Code to Match Documentation (Blank = Erase)

Implementation:

// In StudyReferenceFileParser.cs
// Change conditional updates to unconditional for non-null StudyUpdate values
if (u.CustomId != null) study.CustomId = string.IsNullOrWhiteSpace(u.CustomId) ? null : u.CustomId;
if (u.PdfRelativePath != null) study.PdfRelativePath = string.IsNullOrWhiteSpace(u.PdfRelativePath) ? null : u.PdfRelativePath;

Also requires change in GetValueFromColumn():

// Return null for missing column, empty string for blank field
if (!_columnDictionary.TryGetValue(columnName.ToLowerInvariant(), out var columnIndex))
{
    return null;  // Missing column - don't touch field
}
return fields[columnIndex];  // Return actual value (including empty string)

Pros:

  • Matches existing user documentation and training materials
  • Allows users to intentionally clear field values
  • More powerful: can both update AND clear in single operation
  • Documentation was written with this behavior in mind (original intent)

Cons:

  • Breaking change for users who rely on current behavior
  • Risk of accidental data loss if users don't carefully prepare CSVs
  • Requires users to remove columns entirely to preserve values
  • More destructive default behavior

Risk Assessment: MEDIUM-HIGH

  • Users who have been using the feature with blank cells expecting preservation will suddenly lose data

Option B: Fix Documentation to Match Code (Blank = Preserve)

Implementation:

  • No code changes required
  • Update /user-guide/manage-studies.md to reflect actual behavior

Documentation Change:

> **Note:** Blank/empty cells for `pdfRelativePath` or `customId` are *ignored*
> during the update - they will NOT change existing values. Only provide values
> in cells where you want to update data.

Pros:

  • No code changes, minimal effort
  • Safer default - less risk of accidental data loss
  • Simpler mental model: "only non-empty values cause updates"
  • Consistent with screening column behavior
  • No breaking change for existing users

Cons:

  • No way to intentionally clear a field value via bulk update
  • Original documentation suggests erase behavior was intended
  • May have been an implementation bug that became the de facto standard
  • Reduces feature capability

Risk Assessment: LOW

  • Documentation-only change, no functional impact on existing workflows

Option C: Add Explicit Sentinel Value

Implementation:

private const string ClearSentinel = "[CLEAR]";

// In StudyReferenceFileParser.cs
if (!string.IsNullOrWhiteSpace(u.CustomId))
{
    study.CustomId = u.CustomId == ClearSentinel ? null : u.CustomId;
}
if (!string.IsNullOrWhiteSpace(u.PdfRelativePath))
{
    study.PdfRelativePath = u.PdfRelativePath == ClearSentinel ? null : u.PdfRelativePath;
}

Documentation:

- Blank cells: Existing values are preserved (no change)
- `[CLEAR]`: Explicitly removes/erases the existing value
- Any other value: Updates to that value

Pros:

  • Explicit user intent - no ambiguity
  • Backward compatible with current behavior
  • Provides full functionality (update, preserve, AND clear)
  • Best of both worlds
  • Clear audit trail in CSV files

Cons:

  • More complex for users to learn
  • Documentation and training materials need updates
  • Additional code complexity
  • Sentinel value could theoretically conflict with actual data (unlikely for "[CLEAR]")

Risk Assessment: LOW-MEDIUM

  • Backward compatible, but adds complexity

Option D: Distinguish Missing Column vs Blank Field

Implementation:

// GetValueFromColumn already distinguishes:
// - Missing column → null
// - Blank field → ""

// In StudyReferenceFileParser.cs
if (u.CustomId != null)  // Column exists in CSV
{
    study.CustomId = string.IsNullOrWhiteSpace(u.CustomId) ? null : u.CustomId;
}
// If u.CustomId is null (column missing), don't touch study.CustomId

Behavior: | CSV State | Result | |-----------|--------| | Column missing | Field preserved (untouched) | | Column present, cell blank | Field erased (set to null) | | Column present, cell has value | Field updated |

Pros:

  • Matches original documentation intent
  • Fine-grained control via CSV structure
  • Allows both preservation (omit column) and clearing (include blank column)
  • Leverages existing parsing infrastructure

Cons:

  • Subtle distinction users may misunderstand
  • Current behavior treats both cases identically (breaking change)
  • Requires careful CSV column management
  • Mental model complexity: "presence of column matters, not just value"

Risk Assessment: MEDIUM

  • Breaking change for blank field behavior, but preserves missing column behavior

Recommendation

Preferred Option: Option D (Distinguish Missing Column vs Blank Field)

Rationale:

  1. Aligns with Documentation Intent: The user guide explicitly states blank cells erase values. This suggests the original feature design intended this behavior. Option D restores that intent.

  2. Maintains Intuitive Default: Users who don't include a column don't affect that field - this is already true and remains unchanged.

  3. Enables Full Functionality: Users can:

  4. Preserve: Don't include the column in CSV
  5. Update: Include column with new value
  6. Clear: Include column with blank value

  7. Leverages Existing Infrastructure: The GetValueFromColumn() method already distinguishes null (missing column) from "" (blank field). The fix only requires changing the update logic.

  8. Matches User Expectations: Users reading the current documentation would expect blank cells to erase. The current behavior is the bug, not the documentation.

Implementation Complexity

Low - Single file change in StudyReferenceFileParser.cs:

// Current (buggy):
if (!string.IsNullOrWhiteSpace(u.CustomId)) study.CustomId = u.CustomId;
if (!string.IsNullOrWhiteSpace(u.PdfRelativePath)) study.PdfRelativePath = u.PdfRelativePath;

// Fixed (Option D):
if (u.CustomId != null) study.CustomId = string.IsNullOrWhiteSpace(u.CustomId) ? null : u.CustomId;
if (u.PdfRelativePath != null) study.PdfRelativePath = string.IsNullOrWhiteSpace(u.PdfRelativePath) ? null : u.PdfRelativePath;

Migration Considerations

  1. Announce in Release Notes: Clearly document the behavior change
  2. Update What's New: Add entry explaining the fix
  3. User Communication: Email users who have used bulk update feature
  4. Timing: Deploy during low-activity period

Alternative Consideration

If breaking change risk is deemed too high, Option C (sentinel value) provides equivalent functionality without changing existing behavior. It requires more documentation effort but is fully backward compatible.


Behavioral Consistency Analysis

The Inconsistency Question

The original documentation intentionally specifies different behavior for different column types:

Column Type Documented Behavior Rationale
Screening Blank = Preserved Screening decisions are binary (0/1). Blank means "no decision made" - preserving existing decisions is safer.
CustomId Blank = Erased Metadata fields. Blank could be intentional "clear this field".
PdfRelativePath Blank = Erased Metadata fields. Blank could be intentional "clear this field".

Option E: Consistent Behavior (All Preserve)

Make ALL fields behave the same way: blank = preserve.

Implementation:

  • No code changes (current behavior)
  • Update documentation to match code

Behavior: | Column Type | Blank Cell | |-------------|------------| | Screening | Preserved | | CustomId | Preserved | | PdfRelativePath | Preserved |

Pros:

  • Perfectly consistent - one rule for all columns
  • Safest - no accidental data loss
  • No code changes needed
  • Current behavior preserved

Cons:

  • No way to clear ANY field via bulk update
  • Reduces feature capability significantly
  • Original design intentionally had different behavior

Option F: Consistent Behavior (All Erase)

Make ALL fields behave the same way: blank = erase.

Implementation:

  • Change screening to also erase on blank
  • Change CustomId/PdfRelativePath to erase on blank

Behavior: | Column Type | Blank Cell | |-------------|------------| | Screening | Erased | | CustomId | Erased | | PdfRelativePath | Erased |

Pros:

  • Perfectly consistent - one rule for all columns
  • Maximum capability - can clear any field

Cons:

  • DANGEROUS for screening - blank could accidentally clear decisions
  • Breaking change for all column types
  • Screening "erase" semantically unclear (what does null screening mean?)
  • High risk of data loss

Risk Assessment: HIGH - Not recommended

Revisited Recommendation

After considering consistency, I still recommend Option D (different behavior per column type) because:

  1. Semantic Difference: Screening decisions and metadata fields ARE fundamentally different:
  2. Screening: Ternary state (Include/Exclude/No Decision)
  3. Metadata: String value or no value

  4. Original Design Intent: The documentation explicitly describes different behavior, suggesting this was a deliberate design choice, not an oversight.

  5. Risk Profile: Clearing a screening decision is more dangerous than clearing a PDF path. Different risk warrants different default behavior.

  6. User Mental Model: Users likely think of screening columns differently than metadata columns - they're mapped to users and stages, not simple text fields.

However, if perfect consistency is required, then Option E (all preserve) is the safest choice, accepting the trade-off that fields cannot be cleared via bulk update.

Consistency Decision Matrix

Priority Recommended Option
Feature capability + original intent Option D (different behavior)
Perfect consistency + safety Option E (all preserve)
Perfect consistency + power Option F (all erase) - NOT RECOMMENDED
Backward compatibility + flexibility Option C (sentinel value)

Post-E2E Testing Action Plan

If E2E Testing Confirms Bug

Once testing confirms the code diverges from documentation:

Immediate Actions

  1. Document Findings: Record exact test results with screenshots/logs
  2. Severity Assessment: Classify as medium priority (functional deviation, not data corruption)
  3. Stakeholder Notification: Brief Product Owner (Alex) on findings
  4. Decision Meeting: Schedule 30-min decision session

Decision Timeline

Day Action
0 E2E test confirms bug
1-2 Share Product Owner Report (below) with Alex
3-5 Decision meeting, select option
6-10 Implementation (if code change selected)
11-12 QA verification
13-14 Staging deployment + user communication
15+ Production deployment

Product Owner Report

To: Alex (Product Owner) From: Development Team Subject: Bulk Study Update - Behavior Discrepancy Requiring Decision

Executive Summary

We discovered that the Bulk Study Update feature behaves differently than documented in the user guide. This affects how blank cells in CSV uploads are processed for customId and pdfRelativePath fields.

Impact: Users following the documentation may have unexpected results. No data corruption has occurred, but the feature doesn't work as described.

Current Situation

What Documentation Says What Actually Happens
Blank cell erases existing customId Blank cell preserves existing customId
Blank cell erases existing pdfRelativePath Blank cell preserves existing pdfRelativePath
Blank screening cell preserves decision Blank screening cell preserves decision ✓

User Impact Assessment

Affected Users: Any project administrator who has used Bulk Study Update with blank cells expecting erasure.

Likelihood of Impact: LOW-MEDIUM

  • Feature is relatively new (March 2025)
  • Erasing fields is a less common use case than updating them
  • Users may not have noticed if they weren't specifically trying to clear fields

Data Risk: NONE

  • Current behavior preserves data (safer than documented)
  • No data has been lost; fields may not have been cleared when users expected

Decision Required

We need your input to select one of the following approaches:

Option A: Fix Code to Match Documentation

"Make it work as documented"

What changes: Blank cells will erase field values (as documented)

Pros:

  • Matches what users read in documentation
  • Enables users to clear fields via bulk update
  • No documentation changes needed

Cons:

  • Breaking change for any users relying on current behavior
  • Risk of accidental data erasure if users aren't careful with CSVs
  • Requires user communication about behavior change

Effort: 2-3 days development + testing Risk: Medium (behavior change)

Option B: Fix Documentation to Match Code

"Document what it actually does"

What changes: Update user guide to say blank cells preserve existing values

Pros:

  • No code changes, minimal effort
  • Safest option - preserves data by default
  • No risk of breaking existing workflows

Cons:

  • Users cannot clear fields via bulk update at all
  • Reduces feature capability
  • May frustrate users who read current docs and expected erasure

Effort: 1 day documentation update Risk: Low

Option C: Add Explicit Clear Syntax

"Best of both worlds with [CLEAR] keyword"

What changes:

  • Blank cells preserve (current behavior)
  • Special value [CLEAR] explicitly erases

Pros:

  • Backward compatible
  • Explicit user intent - no accidental erasure
  • Full feature capability
  • Clear audit trail in CSV files

Cons:

  • Users must learn new syntax
  • Slightly more complex to explain
  • Documentation update needed

Effort: 3-4 days development + testing + documentation Risk: Low (additive change)

"Match documented behavior precisely"

What changes:

  • Missing column = field preserved (don't touch what's not mentioned)
  • Blank cell in present column = field erased (explicit blank means clear)

Pros:

  • Matches original documentation exactly
  • Intuitive: "if you include the column, you're managing that field"
  • Preserves screening behavior (already works this way conceptually)

Cons:

  • Subtle distinction users must understand
  • Breaking change for blank cell behavior
  • Requires user communication

Effort: 2-3 days development + testing Risk: Medium (behavior change, but matches docs)

Recommendation

Development team recommends Option D because:

  1. It matches what the documentation already says
  2. The distinction (missing column vs blank cell) is intuitive
  3. It restores the original design intent

However, if minimal risk is priority, choose Option B (fix docs). If maximum flexibility is priority, choose Option C (sentinel value).

Questions for Decision

  1. How many users have actively used Bulk Study Update? (Helps assess impact)
  2. Have we received any support tickets about this behavior?
  3. Is clearing fields via bulk update a common/important use case?
  4. What is our appetite for user communication about a behavior change?

Technical Implementation Plan (Dev Team)

Prerequisites

  • E2E testing completed confirming bug
  • Product Owner decision on selected option
  • Sprint capacity allocated

Phase 1: Code Changes (1-2 days)

File: src/libs/project-management/SyRF.ProjectManagement.Core/Services/StudyReferenceFileParser.cs

Current Code (lines 179-180):

if (!string.IsNullOrWhiteSpace(u.CustomId)) study.CustomId = u.CustomId;
if (!string.IsNullOrWhiteSpace(u.PdfRelativePath)) study.PdfRelativePath = u.PdfRelativePath;

New Code:

// null = column missing (preserve), "" = blank cell (erase), "value" = update
if (u.CustomId != null)
{
    study.CustomId = string.IsNullOrWhiteSpace(u.CustomId) ? null : u.CustomId;
}
if (u.PdfRelativePath != null)
{
    study.PdfRelativePath = string.IsNullOrWhiteSpace(u.PdfRelativePath) ? null : u.PdfRelativePath;
}

File: src/libs/project-management/SyRF.ProjectManagement.Core/Services/ParserImplementations/NewParser/StudyUpdateRecordProcessor.cs

Current Code (line 185):

return string.IsNullOrWhiteSpace(val) ? "" : val;

Verify: This already returns "" for blank and null for missing column - no change needed here.

Phase 2: Unit Tests (0.5 days)

File: src/libs/project-management/SyRF.ProjectManagement.Core.Tests/StudyUpdateTests.cs (new file)

public class BulkStudyUpdateBehaviorTests
{
    [Fact]
    public void MissingCustomIdColumn_ShouldPreserveExistingValue()
    {
        // Arrange: Study with CustomId = "EXISTING"
        // Act: Apply update with CustomId = null (missing column)
        // Assert: CustomId still "EXISTING"
    }

    [Fact]
    public void BlankCustomIdCell_ShouldEraseExistingValue()
    {
        // Arrange: Study with CustomId = "EXISTING"
        // Act: Apply update with CustomId = "" (blank cell)
        // Assert: CustomId is null
    }

    [Fact]
    public void ValueInCustomIdCell_ShouldUpdateValue()
    {
        // Arrange: Study with CustomId = "EXISTING"
        // Act: Apply update with CustomId = "NEW"
        // Assert: CustomId is "NEW"
    }

    // Repeat for PdfRelativePath...

    [Fact]
    public void BlankScreeningCell_ShouldPreserveExistingDecision()
    {
        // Verify screening behavior unchanged
    }
}

Phase 3: Integration Tests (0.5 days)

File: src/services/project-management/SyRF.ProjectManagement.Endpoint.Tests/BulkStudyUpdateIntegrationTests.cs

Test full flow:

  1. Create test project with studies having CustomId/PdfRelativePath values
  2. Generate CSV with various blank/missing column scenarios
  3. Execute bulk update
  4. Verify database state matches expected behavior

Phase 4: Documentation Updates (0.5 days)

User Guide (/user-guide/manage-studies.md):

  • Clarify distinction between missing column and blank cell
  • Add examples showing both scenarios
  • Update warning box to be more precise

What's New (/user-guide/whats-new/):

  • Add entry for behavior clarification/fix

Phase 5: Release & Communication (0.5 days)

Release Notes:

### Bulk Study Update - Behavior Clarification

The Bulk Study Update feature now correctly distinguishes between:
- **Missing column**: Field is preserved (not modified)
- **Blank cell**: Field is cleared (set to empty)

This matches the documented behavior. If you were relying on blank cells
to preserve values, please adjust your CSV files to omit the column entirely.

User Communication (if deemed necessary):

  • Email to project administrators
  • In-app notification on bulk update page

Option C Implementation (Sentinel Value)

If Option C is selected instead:

Code Changes

File: StudyReferenceFileParser.cs

private const string ClearSentinel = "[CLEAR]";

// In update logic:
if (!string.IsNullOrWhiteSpace(u.CustomId))
{
    study.CustomId = u.CustomId.Equals(ClearSentinel, StringComparison.OrdinalIgnoreCase)
        ? null
        : u.CustomId;
}
if (!string.IsNullOrWhiteSpace(u.PdfRelativePath))
{
    study.PdfRelativePath = u.PdfRelativePath.Equals(ClearSentinel, StringComparison.OrdinalIgnoreCase)
        ? null
        : u.PdfRelativePath;
}

Documentation: Must clearly explain [CLEAR] syntax with examples.

Option B Implementation (Documentation Only)

If Option B is selected:

File: /user-guide/manage-studies.md

Change lines 44-45 from:

> **Warning:** Providing an empty/blank cell for `pdfRelativePath` or `customId`
> in the update file *will erase* any existing value

To:

> **Note:** Empty or blank cells for `pdfRelativePath` or `customId` are ignored
> during the update - they will NOT change existing values. To clear a field,
> you must use the SyRF web interface directly.

Implementation Checklist

Pre-Implementation

  • Run E2E tests to confirm current behavior
  • Share Product Owner Report with Alex
  • Decision meeting completed
  • Option selected: ______

Implementation (Option D)

  • Update StudyReferenceFileParser.cs with new logic
  • Create BulkStudyUpdateBehaviorTests.cs unit tests
  • Create integration tests
  • All tests passing locally
  • Code review completed
  • Update user guide documentation
  • Create What's New entry
  • Draft release notes
  • Draft user communication (if needed)

Deployment

  • Deploy to staging
  • QA verification on staging
  • Send user communication (if approved)
  • Deploy to production
  • Monitor for support tickets
  • Create release notes entry
  • Update What's New page
  • Plan user communication if breaking change

Completed: CustomId Accessibility Fix

Issue

The Study entity had inconsistent property accessibility:

Property Before After
PdfRelativePath public get, internal set public get, internal set
CustomId public get, public set public get, internal set

Change Made

File: src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/Study.cs

// Before (line 93)
public string? CustomId { get; set; }

// After
public string? CustomId { get; internal set; }

Verification

Usages checked - Only 2 locations set CustomId:

  1. StudyReferenceFileParser.cs:179 (same assembly) ✓
  2. StudyShould.cs:121 (test project with InternalsVisibleTo) ✓

Tests: All 9 StudyTest tests pass after change.

Rationale

  • DDD Encapsulation: Domain entities should control their own state changes
  • Consistency: Both bulk-updatable metadata fields now have identical accessibility
  • Safety: Prevents external assemblies from bypassing domain logic
  • InternalsVisibleTo: Test assembly explicitly granted access via csproj configuration

File: src/libs/project-management/SyRF.ProjectManagement.Core/SyRF.ProjectManagement.Core.csproj

<ItemGroup>
  <InternalsVisibleTo Include="SyRF.ProjectManagement.Core.Tests" />
</ItemGroup>

This allows the test project to access internal members for testing purposes.

References

  • User Guide: Managing Studies (see user-guide/manage-studies.md in user-guide service)
  • What's New: Bulk Study Update (see user-guide/whats-new/whats-new-bulk-update-study.md)
  • Source: src/libs/project-management/SyRF.ProjectManagement.Core/Services/StudyReferenceFileParser.cs
  • Source: src/libs/project-management/SyRF.ProjectManagement.Core/Services/ParserImplementations/NewParser/StudyUpdateRecordProcessor.cs