Testing the CI/CD Workflow¶

This guide explains how to test the refactored CI/CD workflow to ensure it works correctly before replacing the original workflow.

Overview¶

The refactored CI/CD workflow introduces significant changes:

Reusable workflows for versioning and Docker builds
Matrix strategies for parallel execution
Independent service failure handling
Smart promotion logic

This testing strategy validates all these improvements work as expected.

Testing Tools¶

1. Manual Test Workflow¶

Location: .github/workflows/test-ci-cd.yml

A workflow_dispatch workflow that simulates different scenarios without affecting production.

Test Scenarios:

all-services-changed: Test all services building in parallel
single-service-api: Test single service isolation
single-service-web: Test web service with artifact handling
multiple-services: Test subset of services (API, PM, Web)
simulate-failure: Test failure isolation
version-only: Test version calculation without builds

Usage:

# Via GitHub UI
1. Go to Actions → Test CI/CD Refactored Workflow
2. Click "Run workflow"
3. Select test scenario
4. Optionally simulate failure for a service
5. Run workflow

# Via GitHub CLI
gh workflow run test-ci-cd.yml \
  -f test_scenario=simulate-failure \
  -f simulate_build_failure=api \
  -f skip_promotion=false

2. Validation Script¶

Location: .github/scripts/validate-workflows.sh

Validates workflow syntax and structure locally before committing.

What it checks:

YAML syntax validity
Workflow structure (name, on, jobs)
Reusable workflow references
Matrix strategy fail-fast settings
Input/output descriptions
services.json validity
Pre-validation in promotion jobs

Usage:

cd /home/chris/workspace/syrf
./.github/scripts/validate-workflows.sh

Expected Output:

========================================
GitHub Actions Workflow Validator
========================================

Checking for required tools...
✅ Required tools found

Validating YAML syntax...
✅ ci-cd.yml - Valid YAML
✅ ci-cd-refactored.yml - Valid YAML
✅ reusable-gitversion.yml - Valid YAML
✅ reusable-docker-build.yml - Valid YAML
✅ test-ci-cd.yml - Valid YAML

...

========================================
Validation Summary
========================================

Errors: 0
Warnings: 0

✅ All validations passed!

Testing Scenarios¶

Scenario 1: Single Service Change¶

Goal: Verify only changed service builds, others are skipped

Steps:

Make a small change to API service:

echo "# Test comment" >> src/services/api/README.md
git add .
git commit -m "test: trigger API build only"
git push

Watch GitHub Actions:
detect-changes should show only api_changed: true
Only API version job should run
Only API build job should run
Tag created only for API
Promotion updates only API

Expected Result:

✅ Only API service processes
✅ Other services skipped
✅ Workflow completes successfully

Scenario 2: Multiple Services in Parallel¶

Goal: Verify services build in parallel independently

Steps:

Make changes to multiple services:

echo "# Test" >> src/services/api/README.md
echo "# Test" >> src/services/project-management/README.md
echo "# Test" >> src/services/web/README.md
git add .
git commit -m "test: trigger multiple services"
git push

Watch GitHub Actions:
Version jobs run in parallel (3 jobs simultaneously)
Build jobs run in parallel (3 jobs simultaneously)
All complete around same time (not sequential)

Expected Result:

✅ Jobs run in parallel
✅ Total time ≈ slowest job (not sum of all jobs)
✅ All 3 services tagged and promoted

Scenario 3: Failure Isolation¶

Goal: Verify one service failure doesn't block others

Method 1 - Manual Test Workflow:

gh workflow run test-ci-cd.yml \
  -f test_scenario=simulate-failure \
  -f simulate_build_failure=pm

Method 2 - Introduce Actual Failure:

Break PM Dockerfile temporarily:

# Edit src/services/project-management/SyRF.ProjectManagement.Endpoint/Dockerfile
# Add invalid instruction: INVALID_INSTRUCTION
git add .
git commit -m "test: introduce build failure in PM"
git push

Watch workflow:
PM build fails
API and Web builds continue
Workflow marked as failed (red X)
Successful services still tagged
Revert the breaking change:

git revert HEAD
git push

Expected Result:

✅ PM build fails
✅ API build succeeds
✅ Web build succeeds
✅ API and Web get tagged
✅ PM does NOT get tagged
✅ API and Web promoted to staging
✅ PM NOT promoted
❌ Workflow status = Failed (but all jobs completed)

Scenario 4: Web Service Special Handling¶

Goal: Verify web service artifact download works

Steps:

Change web service:

echo "// Test" >> src/services/web/src/app/app.component.ts
git add .
git commit -m "test: trigger web build with artifacts"
git push

Verify:
build-web-artifacts job runs first
Artifact uploaded successfully
build-docker job downloads artifact
Web build context prepared correctly
Docker build succeeds

Expected Result:

✅ Artifacts built and uploaded
✅ Docker build downloads artifacts
✅ Web image built successfully

Scenario 5: Docs Multi-Checkout¶

Goal: Verify docs service checkout logic works

Steps:

Change docs:

echo "# Test" >> docs/README.md
git add .
git commit -m "test: trigger docs build with multi-checkout"
git push

Verify:
GitHub App token generated
cluster-gitops checked out (sparse)
camarades-infrastructure checked out (sparse)
Docker build succeeds with all repos

Expected Result:

✅ All 3 repos available during build
✅ Docs build succeeds
✅ MkDocs monorepo plugin works

Scenario 6: Promotion Logic¶

Goal: Verify smart promotion only promotes successful services

Steps:

Use test workflow with simulated failure:

gh workflow run test-ci-cd.yml \
  -f test_scenario=simulate-failure \
  -f simulate_build_failure=api \
  -f skip_promotion=false

Check promotion PR:
Only successful services in PR
YAML validated before PR creation
PR auto-merges (staging only)

Expected Result:

✅ PR created with only successful services
✅ Failed service NOT in PR
✅ YAML validation passes
✅ PR auto-merges

Validation Checklist¶

Before replacing the original workflow, verify:

Workflow Structure¶

All workflows have valid YAML syntax
Reusable workflows have workflow_call trigger
Matrix strategies have fail-fast: false
All inputs have descriptions
All outputs are properly referenced

Service Independence¶

Single service change only builds that service
Multiple services build in parallel
One service failure doesn't block others
Failed services don't get tagged
Failed services don't get promoted

Special Cases¶

Web artifacts download correctly
Docs multi-repo checkout works
Lambda deployment still works
GitHub App token generation works

Promotion Logic¶

Only successful services promoted
YAML validated before PR creation
Staging promotion auto-merges
Production promotion requires manual review
Single PR per environment (no conflicts)

Performance¶

Parallel builds faster than sequential
No unnecessary job dependencies
Matrix jobs complete independently

Common Issues and Solutions¶

Issue: Matrix job doesn't run¶

Symptom: Version or build job shows "Job skipped"

Cause: Condition evaluated to false

Solution:

# Check detect-changes outputs
# Ensure matrix.service.condition properly references output
# Example: matrix.service.enabled should match output name

Issue: Reusable workflow not found¶

Symptom: Error: .github/workflows/reusable-gitversion.yml not found

Cause: Path reference incorrect

Solution:

# Ensure uses: path starts with ./
uses: ./.github/workflows/reusable-gitversion.yml
# NOT: uses: .github/workflows/reusable-gitversion.yml

Issue: Outputs not accessible¶

Symptom: needs.version.outputs.api_version is empty

Cause: Matrix job outputs not aggregated

Solution:

# Add aggregation job to collect matrix outputs
# See version-outputs job in ci-cd-refactored.yml

Issue: All jobs run despite no changes¶

Symptom: All services build even when nothing changed

Cause: Condition logic incorrect

Solution:

# Ensure if: condition properly checks changed flags
if: matrix.service.condition == 'true'
# NOT: if: matrix.service.condition (will always be truthy string)

Migration Path¶

Once testing is complete:

Phase 1: Parallel Operation (Recommended)¶

Keep both workflows active
Monitor refactored workflow for 1-2 weeks
Compare results between workflows
Build confidence in refactored version

Phase 2: Gradual Migration¶

Disable original workflow triggers:

# In ci-cd.yml, change:
on:
  push:
    branches:
      - main-disabled  # Change to non-existent branch

Enable refactored workflow:

# In ci-cd-refactored.yml, ensure:
on:
  push:
    branches:
      - main  # Trigger on main branch

Monitor for issues

Phase 3: Cleanup¶

Delete original workflow:

git rm .github/workflows/ci-cd.yml

Rename refactored workflow:

git mv .github/workflows/ci-cd-refactored.yml .github/workflows/ci-cd.yml

Update documentation references

Rollback Plan¶

If issues arise with refactored workflow:

Quick Rollback¶

Re-enable original workflow:

# Revert trigger change in ci-cd.yml
on:
  push:
    branches:
      - main

Disable refactored workflow:

# In ci-cd-refactored.yml:
on:
  push:
    branches:
      - refactored-testing  # Change to different branch

Full Rollback¶

git revert <commit-hash-of-refactoring>
git push

Monitoring and Metrics¶

Track these metrics to validate improvements:

Performance Metrics¶

Build Time: Total workflow duration (should decrease)
Parallelization: Number of concurrent jobs (should increase)
Resource Usage: GitHub Actions minutes (may increase slightly)

Reliability Metrics¶

Success Rate: % of workflows that complete successfully
Partial Success: Services deployed despite other failures
Recovery Time: Time to fix and redeploy after failure

Maintainability Metrics¶

Workflow Size: Lines of YAML (should be ~43% less)
Time to Add Service: Minutes to add new service config
Review Time: Time to review workflow changes

Next Steps¶

After successful testing:

Update CLAUDE.md with migration status
Document any workflow customizations in team.md
Train team on new workflow structure
Archive test workflows (or keep for regression testing)

Testing the CI/CD Workflow¶

Overview¶

Testing Tools¶

1. Manual Test Workflow¶

2. Validation Script¶

Testing Scenarios¶

Scenario 1: Single Service Change¶

Scenario 2: Multiple Services in Parallel¶

Scenario 3: Failure Isolation¶

Scenario 4: Web Service Special Handling¶

Scenario 5: Docs Multi-Checkout¶

Scenario 6: Promotion Logic¶

Validation Checklist¶

Workflow Structure¶

Service Independence¶

Special Cases¶

Promotion Logic¶

Performance¶

Common Issues and Solutions¶

Issue: Matrix job doesn't run¶

Issue: Reusable workflow not found¶

Issue: Outputs not accessible¶

Issue: All jobs run despite no changes¶

Migration Path¶

Phase 1: Parallel Operation (Recommended)¶

Phase 2: Gradual Migration¶

Phase 3: Cleanup¶

Rollback Plan¶

Quick Rollback¶

Full Rollback¶

Monitoring and Metrics¶

Performance Metrics¶

Reliability Metrics¶

Maintainability Metrics¶

Next Steps¶

Resources¶