Troubleshooting: External-DNS CrashLoopBackOff¶
Problem¶
External-DNS pod was in CrashLoopBackOff state with fatal error:
level=fatal msg="Failed to do run once: googleapi: Error 412: Precondition not met for 'entity.change.deletions[syrf.org.uk.][TXT]', conditionNotMet"
Root Cause¶
External-DNS was attempting to delete DNS records from the legacy Jenkins X cluster (which is still running production apps). The records had different ownership metadata:
- Legacy cluster:
external-dns/owner=default - New cluster:
external-dns/owner=camaradesuk
When external-DNS tried to clean up old records during sync, it encountered precondition failures because:
1. Records were managed by a different owner ID
2. Some records may have been modified outside of external-DNS
3. The policy: sync configuration tried to delete records it couldn't manage
Solution¶
Changed external-DNS policy from sync to upsert-only to prevent deletion attempts:
File: infrastructure/external-dns/values.yaml
# Policy for existing DNS records
policy: upsert-only # Changed from 'sync' to avoid deleting old records from previous cluster
What This Does¶
- ✅ Creates new DNS records for Ingresses and Services in the new cluster
- ✅ Updates existing records that external-DNS manages
- ❌ Does NOT delete orphaned records from the legacy cluster
- ✅ Prevents crashes from precondition failures
Implementation Steps¶
-
Update configuration in cluster-gitops:
-
Commit and push:
-
Trigger ArgoCD sync (optional - auto-sync enabled):
-
Delete pod to force restart with new config:
-
Verify pod is running:
Expected Result¶
Pod should start successfully with logs showing:
level=info msg="config: ... Policy:upsert-only ..."
level=info msg="Google project auto-detected: camarades-net"
level=info msg="All records are already up to date"
Orphaned Records¶
The following orphaned TXT records from the legacy cluster remain in Cloud DNS:
syrf.org.uk zone:
- a-app.syrf.org.uk (TXT)
- a-syrf-api.syrf.org.uk (TXT)
- a-syrf-projectmanagement.syrf.org.uk (TXT)
- a-syrf-quartz.syrf.org.uk (TXT)
- a-www.syrf.org.uk (TXT)
camarades.net zone:
- a-*.camarades.net (multiple TXT records for Jenkins X services)
Cleanup Plan¶
DO NOT delete these records until legacy cluster migration is complete.
The legacy Jenkins X cluster is still running production applications and needs these DNS records. Once migration is complete and the legacy cluster is decommissioned:
- Manually delete all
a-*TXT records from both zones - Optionally switch external-DNS back to
policy: syncfor automatic cleanup
Prevention for Future Migrations¶
When migrating between clusters that share DNS zones:
- Option A: Use different owner IDs (
--txt-owner-id) for each cluster - Option B: Use
upsert-onlypolicy during migration period - Option C: Manually clean up old records before starting new external-DNS
- Option D: Use separate DNS zones for each cluster (preferred for isolation)
Related Information¶
- External-DNS Config:
infrastructure/external-dns/values.yaml - Cloud DNS Zones:
syrf.org.uk,camarades.net - Legacy Cluster: Still running production apps (migration in progress)
- ArgoCD Application:
external-dnsinargocdnamespace
References¶
- External-DNS GitHub
- External-DNS Policy Options
- Commit:
6c3de9d- fix(external-dns): change policy to upsert-only