Skip to content

ADR-007: Tag-Based Change Detection Strategy

Status

Approved - Implemented

Context

The previous CI/CD change detection approach used dorny/paths-filter to compare changes between consecutive commits (N vs N-1). This approach had several limitations:

  1. Build failures not detected: If a build failed, the service stayed broken until code was touched again (tag was created but image never pushed)
  2. Dependency changes missed: Changes to shared libraries (e.g., SharedKernel) didn't automatically trigger rebuilds of dependent services
  3. Retag failures not recoverable: If retagging failed, there was no easy way to recover
  4. PR previews used latest: PR environments used latest tag instead of a known good version

Problems with Commit-to-Commit Comparison

Commit N-1: api-v8.20.0 (build succeeded, image exists)
Commit N:   api-v8.21.0 (build FAILED, tag exists but NO image)
Commit N+1: Only docs changed, api not touched
            → paths-filter sees no API changes
            → API stays broken with orphaned tag

Decision

Implement tag-based change detection that compares against the last service-specific git tag, with awareness of build results and dependencies.

Key Changes

  1. Compare against last tag, not last commit: git describe --match "{service}-v*" finds the last tag for each service
  2. Context-aware tag filtering:
  3. Main branch: Only stable tags (exclude prereleases like api-v1.0.0-beta.1)
  4. PR branches: All reachable tags (includes prereleases for full coverage)
  5. Image existence verification: Before retagging or using existing, verify the image actually exists in GHCR
  6. Dependency tracking: Include dependency paths in change detection (SharedKernel changes trigger API/PM/Quartz rebuilds)
  7. Manual recovery: Workflow dispatch inputs allow forcing rebuilds per-service

Decision Matrix

Scenario Main Branch PR Preview
Source changed Build new image Build new image
Deps changed Build new image Build new image
Chart only + image exists Retag existing Retag existing
Chart only + image missing FAIL loudly FAIL loudly
No changes Skip entirely Use last tag's image

Tag Patterns

# Main context: stable tags only
git describe --match "${SERVICE}-v[0-9]*" --exclude "*-v*-*"
# Matches: api-v8.21.0
# Excludes: api-v8.21.0-beta.1, api-v8.21.0-PullRequest0123.1

# PR context: all reachable tags
git describe --match "${SERVICE}-v*"
# Matches: api-v8.21.0, api-v8.21.0-beta.1, api-v8.21.0-PullRequest0123.1

Implementation

  1. Shared detection script: .github/scripts/detect-service-changes.sh
  2. Takes --context main|pr and --service <name> parameters
  3. Outputs JSON with: action, last_tag, last_version, source_changed, chart_changed, image_exists

  4. Reusable workflow: .github/workflows/_detect-changes.yml

  5. Calls detection script for each service
  6. Outputs per-service: action, last_tag, last_version, source_changed, chart_changed

  7. Workflow dispatch: Both ci-cd.yml and pr-preview.yml support manual triggers with force rebuild options

Action Outputs

Action Meaning Trigger
build Build new Docker image Source or deps changed, or no previous tag
retag Retag existing image Only chart changed, image exists
skip Skip this service (main) No changes since last tag
use-existing Use last version (PR) No changes since last tag
fail Cannot proceed Chart changed but image doesn't exist

Consequences

Positive

  • More accurate detection: Changes are compared against actual deployed state (tags), not just previous commit
  • Dependency-aware: SharedKernel changes correctly trigger dependent service rebuilds
  • Self-healing: Failed builds are detected and can be recovered with workflow dispatch
  • Shared logic: Both workflows use same detection script, reducing maintenance
  • PR isolation: PR previews use deterministic tag-based versions, not latest
  • Explicit recovery: Clear error messages with links to force rebuild

Negative

  • More complex: Tag-based detection is more complex than simple path filtering
  • Requires tags: System depends on correct tagging (but this is already enforced)
  • crane dependency: Requires crane CLI for image existence checks

Neutral

  • Same UX for developers: Push to main still works the same way
  • Backwards compatible: Existing tags work with new detection logic

Known Limitations

Tag Pattern Edge Cases

The exclude pattern *-v*-* for main context is designed to exclude prereleases like api-v1.0.0-beta.1. However, it may have edge cases with certain version strings. If issues arise, test with actual prerelease tags to verify correct filtering.

Race Condition with Rapid Merges

If two merges happen quickly to main, both might calculate the same version before the first tag is pushed. This was an existing issue from the previous implementation and is not addressed by this change. Mitigation: GitVersion's branch-based versioning and commit message parsing should produce unique versions in most cases.

Image Existence Check Failures

The check_image_exists function uses crane to verify images exist in GHCR. Network errors or authentication issues may cause false "unknown" results, which default to allowing retag operations. In CI environments where crane is properly configured, this is rarely an issue. Warnings are logged to stderr when this occurs.

GitHub API Rate Limits

The PR preview's write-versions job makes multiple API calls to cluster-gitops. On busy days with many PRs, this could approach rate limits. Monitor GitHub Actions logs for rate limit warnings.

References