Skip to content

Troubleshooting: External-DNS CrashLoopBackOff

Problem

External-DNS pod was in CrashLoopBackOff state with fatal error:

level=fatal msg="Failed to do run once: googleapi: Error 412: Precondition not met for 'entity.change.deletions[syrf.org.uk.][TXT]', conditionNotMet"

Root Cause

External-DNS was attempting to delete DNS records from the legacy Jenkins X cluster (which is still running production apps). The records had different ownership metadata:

  • Legacy cluster: external-dns/owner=default
  • New cluster: external-dns/owner=camaradesuk

When external-DNS tried to clean up old records during sync, it encountered precondition failures because: 1. Records were managed by a different owner ID 2. Some records may have been modified outside of external-DNS 3. The policy: sync configuration tried to delete records it couldn't manage

Solution

Changed external-DNS policy from sync to upsert-only to prevent deletion attempts:

File: infrastructure/external-dns/values.yaml

# Policy for existing DNS records
policy: upsert-only  # Changed from 'sync' to avoid deleting old records from previous cluster

What This Does

  • Creates new DNS records for Ingresses and Services in the new cluster
  • Updates existing records that external-DNS manages
  • Does NOT delete orphaned records from the legacy cluster
  • Prevents crashes from precondition failures

Implementation Steps

  1. Update configuration in cluster-gitops:

    cd /home/chris/workspace/syrf/cluster-gitops
    # Edit infrastructure/external-dns/values.yaml
    # Change: policy: sync → policy: upsert-only
    

  2. Commit and push:

    git add infrastructure/external-dns/values.yaml
    git commit -m "fix(external-dns): change policy to upsert-only to avoid deletion errors"
    git push
    

  3. Trigger ArgoCD sync (optional - auto-sync enabled):

    kubectl patch application external-dns -n argocd \
      --type merge -p '{"operation":{"sync":{"revision":"HEAD"}}}'
    

  4. Delete pod to force restart with new config:

    kubectl delete pod -n external-dns -l app.kubernetes.io/name=external-dns
    

  5. Verify pod is running:

    kubectl get pods -n external-dns
    kubectl logs -n external-dns -l app.kubernetes.io/name=external-dns --tail=20
    

Expected Result

Pod should start successfully with logs showing:

level=info msg="config: ... Policy:upsert-only ..."
level=info msg="Google project auto-detected: camarades-net"
level=info msg="All records are already up to date"

Orphaned Records

The following orphaned TXT records from the legacy cluster remain in Cloud DNS:

syrf.org.uk zone: - a-app.syrf.org.uk (TXT) - a-syrf-api.syrf.org.uk (TXT) - a-syrf-projectmanagement.syrf.org.uk (TXT) - a-syrf-quartz.syrf.org.uk (TXT) - a-www.syrf.org.uk (TXT)

camarades.net zone: - a-*.camarades.net (multiple TXT records for Jenkins X services)

Cleanup Plan

DO NOT delete these records until legacy cluster migration is complete.

The legacy Jenkins X cluster is still running production applications and needs these DNS records. Once migration is complete and the legacy cluster is decommissioned:

  1. Manually delete all a-* TXT records from both zones
  2. Optionally switch external-DNS back to policy: sync for automatic cleanup

Prevention for Future Migrations

When migrating between clusters that share DNS zones:

  1. Option A: Use different owner IDs (--txt-owner-id) for each cluster
  2. Option B: Use upsert-only policy during migration period
  3. Option C: Manually clean up old records before starting new external-DNS
  4. Option D: Use separate DNS zones for each cluster (preferred for isolation)
  • External-DNS Config: infrastructure/external-dns/values.yaml
  • Cloud DNS Zones: syrf.org.uk, camarades.net
  • Legacy Cluster: Still running production apps (migration in progress)
  • ArgoCD Application: external-dns in argocd namespace

References