Troubleshooting: DatabaseLifecycle Post-Script Job RBAC Failure¶
Problem¶
DatabaseLifecycle CR is stuck in Failed phase with the message "Post-script job execution failed". This affects PR preview environments that use post-script jobs (e.g., index-init for MongoDB index creation).
Symptom:
kubectl get databaselifecycle -n pr-2285
# NAME DATABASE PHASE SOURCE SEEDED AGE
# pr-database syrf_pr_2285 Failed 52m
Error in operator logs:
jobs.batch "dbl-post-pr-database-1768848824" is forbidden:
User "system:serviceaccount:database-lifecycle-operator:database-lifecycle-operator"
cannot get resource "jobs" in API group "batch" in the namespace "pr-2285"
Root Cause¶
The DatabaseLifecycle operator's ClusterRole is missing permissions for batch/jobs. When the operator attempts to create a post-script Job in a dynamically-created PR namespace (e.g., pr-2285), Kubernetes RBAC denies the request.
Current ClusterRole permissions (missing batch/jobs):
rules:
- apiGroups: ["database.syrf.org.uk"]
resources: ["databaselifecycles", "databaselifecycles/status"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch"]
# MISSING: batch/jobs permissions
Investigation Steps¶
1. Check DatabaseLifecycle status¶
Look for:
Phase: FailedConditionsshowingPostScriptFailedMessage: Post-script job failed
2. Check operator logs for RBAC errors¶
kubectl logs -n database-lifecycle-operator deployment/database-lifecycle-operator --since=1h \
| grep -E "(Forbidden|cannot|RBAC|jobs.batch)"
Look for lines containing:
jobs.batch ... is forbiddencannot get resource "jobs"Failed to create job
3. Verify current ClusterRole permissions¶
Check if batch/jobs is listed in the rules.
4. Test permissions manually¶
kubectl auth can-i create jobs \
--as=system:serviceaccount:database-lifecycle-operator:database-lifecycle-operator \
-n pr-2285
# Expected (if broken): no
# Expected (if fixed): yes
Solution¶
Add batch/jobs permissions to the operator's ClusterRole.
File: charts/database-lifecycle-operator/templates/rbac.yaml
Add the following rule:
rules:
# ... existing rules ...
# Jobs for post-script and health-check execution
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "create", "delete"]
Implementation Steps¶
- Update the Helm chart in cluster-gitops:
cd /home/chris/workspace/cluster-gitops
# Edit charts/database-lifecycle-operator/templates/rbac.yaml
# Add batch/jobs permissions as shown above
- Commit and push:
git add charts/database-lifecycle-operator/templates/rbac.yaml
git commit -m "fix(dbl-operator): add batch/jobs RBAC for post-script jobs"
git push
- Wait for ArgoCD sync (or trigger manually):
# Check sync status
kubectl get application database-lifecycle-operator -n argocd
# Force sync if needed
argocd app sync database-lifecycle-operator
- Verify the ClusterRole was updated:
Expected Result¶
After fixing RBAC:
- Re-trigger the DatabaseLifecycle by updating
forceReseedor deleting/recreating the CR:
kubectl patch databaselifecycle pr-database -n pr-2285 \
--type merge -p '{"spec":{"forceReseed":true}}'
- Verify post-script job runs:
- Check DatabaseLifecycle reaches Ready phase:
kubectl get databaselifecycle -n pr-2285
# PHASE should become: Ready (or Seeded if seeding enabled)
- Operator logs should show success:
kubectl logs -n database-lifecycle-operator deployment/database-lifecycle-operator --since=5m \
| grep -E "(post-script|job)"
# Should see: "Running post-script job" followed by success messages
Related Information¶
- Operator Helm Chart:
charts/database-lifecycle-operator/ - RBAC Template:
charts/database-lifecycle-operator/templates/rbac.yaml - Operator Namespace:
database-lifecycle-operator - ArgoCD Application:
database-lifecycle-operator - Affected Environments: PR preview namespaces (
pr-*)
Why This Only Affects Preview Environments¶
The post-script job feature is primarily used in preview environments to run index initialization after database seeding. Staging and production environments don't use this feature because they connect to pre-existing databases with indexes already in place.