Testing the CI/CD Workflow¶
This guide explains how to test the refactored CI/CD workflow to ensure it works correctly before replacing the original workflow.
Overview¶
The refactored CI/CD workflow introduces significant changes:
- Reusable workflows for versioning and Docker builds
- Matrix strategies for parallel execution
- Independent service failure handling
- Smart promotion logic
This testing strategy validates all these improvements work as expected.
Testing Tools¶
1. Manual Test Workflow¶
Location: .github/workflows/test-ci-cd.yml
A workflow_dispatch workflow that simulates different scenarios without affecting production.
Test Scenarios:
all-services-changed: Test all services building in parallelsingle-service-api: Test single service isolationsingle-service-web: Test web service with artifact handlingmultiple-services: Test subset of services (API, PM, Web)simulate-failure: Test failure isolationversion-only: Test version calculation without builds
Usage:
# Via GitHub UI
1. Go to Actions → Test CI/CD Refactored Workflow
2. Click "Run workflow"
3. Select test scenario
4. Optionally simulate failure for a service
5. Run workflow
# Via GitHub CLI
gh workflow run test-ci-cd.yml \
-f test_scenario=simulate-failure \
-f simulate_build_failure=api \
-f skip_promotion=false
2. Validation Script¶
Location: .github/scripts/validate-workflows.sh
Validates workflow syntax and structure locally before committing.
What it checks:
- YAML syntax validity
- Workflow structure (name, on, jobs)
- Reusable workflow references
- Matrix strategy fail-fast settings
- Input/output descriptions
- services.json validity
- Pre-validation in promotion jobs
Usage:
Expected Output:
========================================
GitHub Actions Workflow Validator
========================================
Checking for required tools...
✅ Required tools found
Validating YAML syntax...
✅ ci-cd.yml - Valid YAML
✅ ci-cd-refactored.yml - Valid YAML
✅ reusable-gitversion.yml - Valid YAML
✅ reusable-docker-build.yml - Valid YAML
✅ test-ci-cd.yml - Valid YAML
...
========================================
Validation Summary
========================================
Errors: 0
Warnings: 0
✅ All validations passed!
Testing Scenarios¶
Scenario 1: Single Service Change¶
Goal: Verify only changed service builds, others are skipped
Steps:
- Make a small change to API service:
echo "# Test comment" >> src/services/api/README.md
git add .
git commit -m "test: trigger API build only"
git push
- Watch GitHub Actions:
- detect-changes should show only
api_changed: true - Only API version job should run
- Only API build job should run
- Tag created only for API
- Promotion updates only API
Expected Result:
- ✅ Only API service processes
- ✅ Other services skipped
- ✅ Workflow completes successfully
Scenario 2: Multiple Services in Parallel¶
Goal: Verify services build in parallel independently
Steps:
- Make changes to multiple services:
echo "# Test" >> src/services/api/README.md
echo "# Test" >> src/services/project-management/README.md
echo "# Test" >> src/services/web/README.md
git add .
git commit -m "test: trigger multiple services"
git push
- Watch GitHub Actions:
- Version jobs run in parallel (3 jobs simultaneously)
- Build jobs run in parallel (3 jobs simultaneously)
- All complete around same time (not sequential)
Expected Result:
- ✅ Jobs run in parallel
- ✅ Total time ≈ slowest job (not sum of all jobs)
- ✅ All 3 services tagged and promoted
Scenario 3: Failure Isolation¶
Goal: Verify one service failure doesn't block others
Method 1 - Manual Test Workflow:
Method 2 - Introduce Actual Failure:
- Break PM Dockerfile temporarily:
# Edit src/services/project-management/SyRF.ProjectManagement.Endpoint/Dockerfile
# Add invalid instruction: INVALID_INSTRUCTION
git add .
git commit -m "test: introduce build failure in PM"
git push
- Watch workflow:
- PM build fails
- API and Web builds continue
- Workflow marked as failed (red X)
-
Successful services still tagged
-
Revert the breaking change:
Expected Result:
- ✅ PM build fails
- ✅ API build succeeds
- ✅ Web build succeeds
- ✅ API and Web get tagged
- ✅ PM does NOT get tagged
- ✅ API and Web promoted to staging
- ✅ PM NOT promoted
- ❌ Workflow status = Failed (but all jobs completed)
Scenario 4: Web Service Special Handling¶
Goal: Verify web service artifact download works
Steps:
- Change web service:
echo "// Test" >> src/services/web/src/app/app.component.ts
git add .
git commit -m "test: trigger web build with artifacts"
git push
- Verify:
- build-web-artifacts job runs first
- Artifact uploaded successfully
- build-docker job downloads artifact
- Web build context prepared correctly
- Docker build succeeds
Expected Result:
- ✅ Artifacts built and uploaded
- ✅ Docker build downloads artifacts
- ✅ Web image built successfully
Scenario 5: Docs Multi-Checkout¶
Goal: Verify docs service checkout logic works
Steps:
- Change docs:
echo "# Test" >> docs/README.md
git add .
git commit -m "test: trigger docs build with multi-checkout"
git push
- Verify:
- GitHub App token generated
- cluster-gitops checked out (sparse)
- camarades-infrastructure checked out (sparse)
- Docker build succeeds with all repos
Expected Result:
- ✅ All 3 repos available during build
- ✅ Docs build succeeds
- ✅ MkDocs monorepo plugin works
Scenario 6: Promotion Logic¶
Goal: Verify smart promotion only promotes successful services
Steps:
- Use test workflow with simulated failure:
gh workflow run test-ci-cd.yml \
-f test_scenario=simulate-failure \
-f simulate_build_failure=api \
-f skip_promotion=false
- Check promotion PR:
- Only successful services in PR
- YAML validated before PR creation
- PR auto-merges (staging only)
Expected Result:
- ✅ PR created with only successful services
- ✅ Failed service NOT in PR
- ✅ YAML validation passes
- ✅ PR auto-merges
Validation Checklist¶
Before replacing the original workflow, verify:
Workflow Structure¶
- All workflows have valid YAML syntax
- Reusable workflows have workflow_call trigger
- Matrix strategies have
fail-fast: false - All inputs have descriptions
- All outputs are properly referenced
Service Independence¶
- Single service change only builds that service
- Multiple services build in parallel
- One service failure doesn't block others
- Failed services don't get tagged
- Failed services don't get promoted
Special Cases¶
- Web artifacts download correctly
- Docs multi-repo checkout works
- Lambda deployment still works
- GitHub App token generation works
Promotion Logic¶
- Only successful services promoted
- YAML validated before PR creation
- Staging promotion auto-merges
- Production promotion requires manual review
- Single PR per environment (no conflicts)
Performance¶
- Parallel builds faster than sequential
- No unnecessary job dependencies
- Matrix jobs complete independently
Common Issues and Solutions¶
Issue: Matrix job doesn't run¶
Symptom: Version or build job shows "Job skipped"
Cause: Condition evaluated to false
Solution:
# Check detect-changes outputs
# Ensure matrix.service.condition properly references output
# Example: matrix.service.enabled should match output name
Issue: Reusable workflow not found¶
Symptom: Error: .github/workflows/reusable-gitversion.yml not found
Cause: Path reference incorrect
Solution:
# Ensure uses: path starts with ./
uses: ./.github/workflows/reusable-gitversion.yml
# NOT: uses: .github/workflows/reusable-gitversion.yml
Issue: Outputs not accessible¶
Symptom: needs.version.outputs.api_version is empty
Cause: Matrix job outputs not aggregated
Solution:
Issue: All jobs run despite no changes¶
Symptom: All services build even when nothing changed
Cause: Condition logic incorrect
Solution:
# Ensure if: condition properly checks changed flags
if: matrix.service.condition == 'true'
# NOT: if: matrix.service.condition (will always be truthy string)
Migration Path¶
Once testing is complete:
Phase 1: Parallel Operation (Recommended)¶
- Keep both workflows active
- Monitor refactored workflow for 1-2 weeks
- Compare results between workflows
- Build confidence in refactored version
Phase 2: Gradual Migration¶
- Disable original workflow triggers:
- Enable refactored workflow:
- Monitor for issues
Phase 3: Cleanup¶
- Delete original workflow:
- Rename refactored workflow:
- Update documentation references
Rollback Plan¶
If issues arise with refactored workflow:
Quick Rollback¶
- Re-enable original workflow:
- Disable refactored workflow:
Full Rollback¶
Monitoring and Metrics¶
Track these metrics to validate improvements:
Performance Metrics¶
- Build Time: Total workflow duration (should decrease)
- Parallelization: Number of concurrent jobs (should increase)
- Resource Usage: GitHub Actions minutes (may increase slightly)
Reliability Metrics¶
- Success Rate: % of workflows that complete successfully
- Partial Success: Services deployed despite other failures
- Recovery Time: Time to fix and redeploy after failure
Maintainability Metrics¶
- Workflow Size: Lines of YAML (should be ~43% less)
- Time to Add Service: Minutes to add new service config
- Review Time: Time to review workflow changes
Next Steps¶
After successful testing:
- Update CLAUDE.md with migration status
- Document any workflow customizations in team.md
- Train team on new workflow structure
- Archive test workflows (or keep for regression testing)