ADR-007: Tag-Based Change Detection Strategy¶
Status¶
Approved - Implemented
Context¶
The previous CI/CD change detection approach used dorny/paths-filter to compare changes between consecutive commits (N vs N-1). This approach had several limitations:
- Build failures not detected: If a build failed, the service stayed broken until code was touched again (tag was created but image never pushed)
- Dependency changes missed: Changes to shared libraries (e.g., SharedKernel) didn't automatically trigger rebuilds of dependent services
- Retag failures not recoverable: If retagging failed, there was no easy way to recover
- PR previews used
latest: PR environments usedlatesttag instead of a known good version
Problems with Commit-to-Commit Comparison¶
Commit N-1: api-v8.20.0 (build succeeded, image exists)
Commit N: api-v8.21.0 (build FAILED, tag exists but NO image)
Commit N+1: Only docs changed, api not touched
→ paths-filter sees no API changes
→ API stays broken with orphaned tag
Decision¶
Implement tag-based change detection that compares against the last service-specific git tag, with awareness of build results and dependencies.
Key Changes¶
- Compare against last tag, not last commit:
git describe --match "{service}-v*"finds the last tag for each service - Context-aware tag filtering:
- Main branch: Only stable tags (exclude prereleases like
api-v1.0.0-beta.1) - PR branches: All reachable tags (includes prereleases for full coverage)
- Image existence verification: Before retagging or using existing, verify the image actually exists in GHCR
- Dependency tracking: Include dependency paths in change detection (SharedKernel changes trigger API/PM/Quartz rebuilds)
- Manual recovery: Workflow dispatch inputs allow forcing rebuilds per-service
Decision Matrix¶
| Scenario | Main Branch | PR Preview |
|---|---|---|
| Source changed | Build new image | Build new image |
| Deps changed | Build new image | Build new image |
| Chart only + image exists | Retag existing | Retag existing |
| Chart only + image missing | FAIL loudly | FAIL loudly |
| No changes | Skip entirely | Use last tag's image |
Tag Patterns¶
# Main context: stable tags only
git describe --match "${SERVICE}-v[0-9]*" --exclude "*-v*-*"
# Matches: api-v8.21.0
# Excludes: api-v8.21.0-beta.1, api-v8.21.0-PullRequest0123.1
# PR context: all reachable tags
git describe --match "${SERVICE}-v*"
# Matches: api-v8.21.0, api-v8.21.0-beta.1, api-v8.21.0-PullRequest0123.1
Implementation¶
- Shared detection script:
.github/scripts/detect-service-changes.sh - Takes
--context main|prand--service <name>parameters -
Outputs JSON with:
action,last_tag,last_version,source_changed,chart_changed,image_exists -
Reusable workflow:
.github/workflows/_detect-changes.yml - Calls detection script for each service
-
Outputs per-service: action, last_tag, last_version, source_changed, chart_changed
-
Workflow dispatch: Both
ci-cd.ymlandpr-preview.ymlsupport manual triggers with force rebuild options
Action Outputs¶
| Action | Meaning | Trigger |
|---|---|---|
build |
Build new Docker image | Source or deps changed, or no previous tag |
retag |
Retag existing image | Only chart changed, image exists |
skip |
Skip this service (main) | No changes since last tag |
use-existing |
Use last version (PR) | No changes since last tag |
fail |
Cannot proceed | Chart changed but image doesn't exist |
Consequences¶
Positive¶
- More accurate detection: Changes are compared against actual deployed state (tags), not just previous commit
- Dependency-aware: SharedKernel changes correctly trigger dependent service rebuilds
- Self-healing: Failed builds are detected and can be recovered with workflow dispatch
- Shared logic: Both workflows use same detection script, reducing maintenance
- PR isolation: PR previews use deterministic tag-based versions, not
latest - Explicit recovery: Clear error messages with links to force rebuild
Negative¶
- More complex: Tag-based detection is more complex than simple path filtering
- Requires tags: System depends on correct tagging (but this is already enforced)
- crane dependency: Requires crane CLI for image existence checks
Neutral¶
- Same UX for developers: Push to main still works the same way
- Backwards compatible: Existing tags work with new detection logic
Known Limitations¶
Tag Pattern Edge Cases¶
The exclude pattern *-v*-* for main context is designed to exclude prereleases like api-v1.0.0-beta.1. However, it may have edge cases with certain version strings. If issues arise, test with actual prerelease tags to verify correct filtering.
Race Condition with Rapid Merges¶
If two merges happen quickly to main, both might calculate the same version before the first tag is pushed. This was an existing issue from the previous implementation and is not addressed by this change. Mitigation: GitVersion's branch-based versioning and commit message parsing should produce unique versions in most cases.
Image Existence Check Failures¶
The check_image_exists function uses crane to verify images exist in GHCR. Network errors or authentication issues may cause false "unknown" results, which default to allowing retag operations. In CI environments where crane is properly configured, this is rarely an issue. Warnings are logged to stderr when this occurs.
GitHub API Rate Limits¶
The PR preview's write-versions job makes multiple API calls to cluster-gitops. On busy days with many PRs, this could approach rate limits. Monitor GitHub Actions logs for rate limit warnings.
Related Decisions¶
- ADR-001: CI/CD Implementation Approach - Original CI/CD design
- ADR-002: GitVersion Configuration - How versions are calculated
- ADR-003: Cluster Architecture - Build optimization with retag