Skip to content

Testing the CI/CD Workflow

This guide explains how to test the refactored CI/CD workflow to ensure it works correctly before replacing the original workflow.

Overview

The refactored CI/CD workflow introduces significant changes:

  • Reusable workflows for versioning and Docker builds
  • Matrix strategies for parallel execution
  • Independent service failure handling
  • Smart promotion logic

This testing strategy validates all these improvements work as expected.

Testing Tools

1. Manual Test Workflow

Location: .github/workflows/test-ci-cd.yml

A workflow_dispatch workflow that simulates different scenarios without affecting production.

Test Scenarios:

  • all-services-changed: Test all services building in parallel
  • single-service-api: Test single service isolation
  • single-service-web: Test web service with artifact handling
  • multiple-services: Test subset of services (API, PM, Web)
  • simulate-failure: Test failure isolation
  • version-only: Test version calculation without builds

Usage:

# Via GitHub UI
1. Go to Actions  Test CI/CD Refactored Workflow
2. Click "Run workflow"
3. Select test scenario
4. Optionally simulate failure for a service
5. Run workflow

# Via GitHub CLI
gh workflow run test-ci-cd.yml \
  -f test_scenario=simulate-failure \
  -f simulate_build_failure=api \
  -f skip_promotion=false

2. Validation Script

Location: .github/scripts/validate-workflows.sh

Validates workflow syntax and structure locally before committing.

What it checks:

  • YAML syntax validity
  • Workflow structure (name, on, jobs)
  • Reusable workflow references
  • Matrix strategy fail-fast settings
  • Input/output descriptions
  • services.json validity
  • Pre-validation in promotion jobs

Usage:

cd /home/chris/workspace/syrf
./.github/scripts/validate-workflows.sh

Expected Output:

========================================
GitHub Actions Workflow Validator
========================================

Checking for required tools...
✅ Required tools found

Validating YAML syntax...
✅ ci-cd.yml - Valid YAML
✅ ci-cd-refactored.yml - Valid YAML
✅ reusable-gitversion.yml - Valid YAML
✅ reusable-docker-build.yml - Valid YAML
✅ test-ci-cd.yml - Valid YAML

...

========================================
Validation Summary
========================================

Errors: 0
Warnings: 0

✅ All validations passed!

Testing Scenarios

Scenario 1: Single Service Change

Goal: Verify only changed service builds, others are skipped

Steps:

  1. Make a small change to API service:
echo "# Test comment" >> src/services/api/README.md
git add .
git commit -m "test: trigger API build only"
git push
  1. Watch GitHub Actions:
  2. detect-changes should show only api_changed: true
  3. Only API version job should run
  4. Only API build job should run
  5. Tag created only for API
  6. Promotion updates only API

Expected Result:

  • ✅ Only API service processes
  • ✅ Other services skipped
  • ✅ Workflow completes successfully

Scenario 2: Multiple Services in Parallel

Goal: Verify services build in parallel independently

Steps:

  1. Make changes to multiple services:
echo "# Test" >> src/services/api/README.md
echo "# Test" >> src/services/project-management/README.md
echo "# Test" >> src/services/web/README.md
git add .
git commit -m "test: trigger multiple services"
git push
  1. Watch GitHub Actions:
  2. Version jobs run in parallel (3 jobs simultaneously)
  3. Build jobs run in parallel (3 jobs simultaneously)
  4. All complete around same time (not sequential)

Expected Result:

  • ✅ Jobs run in parallel
  • ✅ Total time ≈ slowest job (not sum of all jobs)
  • ✅ All 3 services tagged and promoted

Scenario 3: Failure Isolation

Goal: Verify one service failure doesn't block others

Method 1 - Manual Test Workflow:

gh workflow run test-ci-cd.yml \
  -f test_scenario=simulate-failure \
  -f simulate_build_failure=pm

Method 2 - Introduce Actual Failure:

  1. Break PM Dockerfile temporarily:
# Edit src/services/project-management/SyRF.ProjectManagement.Endpoint/Dockerfile
# Add invalid instruction: INVALID_INSTRUCTION
git add .
git commit -m "test: introduce build failure in PM"
git push
  1. Watch workflow:
  2. PM build fails
  3. API and Web builds continue
  4. Workflow marked as failed (red X)
  5. Successful services still tagged

  6. Revert the breaking change:

git revert HEAD
git push

Expected Result:

  • ✅ PM build fails
  • ✅ API build succeeds
  • ✅ Web build succeeds
  • ✅ API and Web get tagged
  • ✅ PM does NOT get tagged
  • ✅ API and Web promoted to staging
  • ✅ PM NOT promoted
  • ❌ Workflow status = Failed (but all jobs completed)

Scenario 4: Web Service Special Handling

Goal: Verify web service artifact download works

Steps:

  1. Change web service:
echo "// Test" >> src/services/web/src/app/app.component.ts
git add .
git commit -m "test: trigger web build with artifacts"
git push
  1. Verify:
  2. build-web-artifacts job runs first
  3. Artifact uploaded successfully
  4. build-docker job downloads artifact
  5. Web build context prepared correctly
  6. Docker build succeeds

Expected Result:

  • ✅ Artifacts built and uploaded
  • ✅ Docker build downloads artifacts
  • ✅ Web image built successfully

Scenario 5: Docs Multi-Checkout

Goal: Verify docs service checkout logic works

Steps:

  1. Change docs:
echo "# Test" >> docs/README.md
git add .
git commit -m "test: trigger docs build with multi-checkout"
git push
  1. Verify:
  2. GitHub App token generated
  3. cluster-gitops checked out (sparse)
  4. camarades-infrastructure checked out (sparse)
  5. Docker build succeeds with all repos

Expected Result:

  • ✅ All 3 repos available during build
  • ✅ Docs build succeeds
  • ✅ MkDocs monorepo plugin works

Scenario 6: Promotion Logic

Goal: Verify smart promotion only promotes successful services

Steps:

  1. Use test workflow with simulated failure:
gh workflow run test-ci-cd.yml \
  -f test_scenario=simulate-failure \
  -f simulate_build_failure=api \
  -f skip_promotion=false
  1. Check promotion PR:
  2. Only successful services in PR
  3. YAML validated before PR creation
  4. PR auto-merges (staging only)

Expected Result:

  • ✅ PR created with only successful services
  • ✅ Failed service NOT in PR
  • ✅ YAML validation passes
  • ✅ PR auto-merges

Validation Checklist

Before replacing the original workflow, verify:

Workflow Structure

  • All workflows have valid YAML syntax
  • Reusable workflows have workflow_call trigger
  • Matrix strategies have fail-fast: false
  • All inputs have descriptions
  • All outputs are properly referenced

Service Independence

  • Single service change only builds that service
  • Multiple services build in parallel
  • One service failure doesn't block others
  • Failed services don't get tagged
  • Failed services don't get promoted

Special Cases

  • Web artifacts download correctly
  • Docs multi-repo checkout works
  • Lambda deployment still works
  • GitHub App token generation works

Promotion Logic

  • Only successful services promoted
  • YAML validated before PR creation
  • Staging promotion auto-merges
  • Production promotion requires manual review
  • Single PR per environment (no conflicts)

Performance

  • Parallel builds faster than sequential
  • No unnecessary job dependencies
  • Matrix jobs complete independently

Common Issues and Solutions

Issue: Matrix job doesn't run

Symptom: Version or build job shows "Job skipped"

Cause: Condition evaluated to false

Solution:

# Check detect-changes outputs
# Ensure matrix.service.condition properly references output
# Example: matrix.service.enabled should match output name

Issue: Reusable workflow not found

Symptom: Error: .github/workflows/reusable-gitversion.yml not found

Cause: Path reference incorrect

Solution:

# Ensure uses: path starts with ./
uses: ./.github/workflows/reusable-gitversion.yml
# NOT: uses: .github/workflows/reusable-gitversion.yml

Issue: Outputs not accessible

Symptom: needs.version.outputs.api_version is empty

Cause: Matrix job outputs not aggregated

Solution:

# Add aggregation job to collect matrix outputs
# See version-outputs job in ci-cd-refactored.yml

Issue: All jobs run despite no changes

Symptom: All services build even when nothing changed

Cause: Condition logic incorrect

Solution:

# Ensure if: condition properly checks changed flags
if: matrix.service.condition == 'true'
# NOT: if: matrix.service.condition (will always be truthy string)

Migration Path

Once testing is complete:

  1. Keep both workflows active
  2. Monitor refactored workflow for 1-2 weeks
  3. Compare results between workflows
  4. Build confidence in refactored version

Phase 2: Gradual Migration

  1. Disable original workflow triggers:
# In ci-cd.yml, change:
on:
  push:
    branches:
      - main-disabled  # Change to non-existent branch
  1. Enable refactored workflow:
# In ci-cd-refactored.yml, ensure:
on:
  push:
    branches:
      - main  # Trigger on main branch
  1. Monitor for issues

Phase 3: Cleanup

  1. Delete original workflow:
git rm .github/workflows/ci-cd.yml
  1. Rename refactored workflow:
git mv .github/workflows/ci-cd-refactored.yml .github/workflows/ci-cd.yml
  1. Update documentation references

Rollback Plan

If issues arise with refactored workflow:

Quick Rollback

  1. Re-enable original workflow:
# Revert trigger change in ci-cd.yml
on:
  push:
    branches:
      - main
  1. Disable refactored workflow:
# In ci-cd-refactored.yml:
on:
  push:
    branches:
      - refactored-testing  # Change to different branch

Full Rollback

git revert <commit-hash-of-refactoring>
git push

Monitoring and Metrics

Track these metrics to validate improvements:

Performance Metrics

  • Build Time: Total workflow duration (should decrease)
  • Parallelization: Number of concurrent jobs (should increase)
  • Resource Usage: GitHub Actions minutes (may increase slightly)

Reliability Metrics

  • Success Rate: % of workflows that complete successfully
  • Partial Success: Services deployed despite other failures
  • Recovery Time: Time to fix and redeploy after failure

Maintainability Metrics

  • Workflow Size: Lines of YAML (should be ~43% less)
  • Time to Add Service: Minutes to add new service config
  • Review Time: Time to review workflow changes

Next Steps

After successful testing:

  1. Update CLAUDE.md with migration status
  2. Document any workflow customizations in team.md
  3. Train team on new workflow structure
  4. Archive test workflows (or keep for regression testing)

Resources