Cluster Bootstrap Guide¶
Overview¶
This guide documents the bootstrap process for setting up a new GKE cluster with ArgoCD-based GitOps deployment.
Prerequisites¶
- GKE cluster provisioned via Terraform (see camarades-infrastructure repo)
- kubectl configured with cluster access
- Helm 3 installed
- Git access to cluster-gitops repository
Bootstrap Order¶
The bootstrap follows a specific order due to dependencies:
1. Terraform (Infrastructure) → GKE Cluster
2. kubectl (Bootstrap) → ArgoCD Installation
3. kubectl (Bootstrap) → ArgoCD Repository Credentials
4. kubectl (Bootstrap) → ArgoCD Applications (app-of-apps pattern)
5. ArgoCD (GitOps) → All subsequent resources
Step 1: Provision GKE Cluster (Terraform)¶
Location: camarades-infrastructure/terraform/
cd /path/to/camarades-infrastructure/terraform
# Initialize Terraform
terraform init
# Review plan
terraform plan
# Apply (creates GKE cluster)
terraform apply
What gets created: - GKE cluster (camaradesuk, europe-west2-a, 3-6 nodes) - Node pool with autoscaling - Workload Identity configuration - Service account for external-dns - IAM bindings for DNS management
Step 2: Install ArgoCD (Bootstrap Exception)¶
Why kubectl here: ArgoCD is the "bootstrap component" - it cannot install itself via GitOps initially.
# Add ArgoCD Helm repository
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
# Install ArgoCD in HA mode
helm install argocd argo/argo-cd \
--namespace argocd \
--create-namespace \
--set server.replicas=2 \
--set controller.replicas=1 \
--set repoServer.replicas=2 \
--set redis-ha.enabled=true
Verify installation:
Step 3: Configure Repository Access (Bootstrap Exception)¶
Why kubectl here: Chicken-and-egg problem - ArgoCD needs credentials to access the repo that contains its own configuration.
Recommended: Use Credential Template (Simplest)¶
ArgoCD credential templates allow a single credential to apply to all repositories with matching URL prefixes. This is the recommended approach for the camaradesuk organization.
Prerequisites:
- GitHub Personal Access Token (PAT) with repo scope
Steps:
-
Create credential file from template:
-
Edit the file and replace placeholders:
<GITHUB_USERNAME>: Your GitHub username-
<GITHUB_PAT>: Your GitHub Personal Access Token -
Apply the credential template:
-
Delete the file with credentials (important!):
-
Verify the credential was created:
What this does: This single credential template will automatically apply to ALL repositories under github.com/camaradesuk, including:
- cluster-gitops
- syrf (monorepo)
- Any future repositories under the organization
Security: The .gitignore file prevents github-repo-credentials.yaml from being committed. Only the .template file is version controlled.
Alternative: Per-Repository Credentials¶
If you need different credentials for different repositories, you can create individual repository secrets:
kubectl create secret generic cluster-gitops-repo \
--namespace argocd \
--from-literal=type=git \
--from-literal=url=https://github.com/camaradesuk/cluster-gitops.git \
--from-literal=username=<github-username> \
--from-literal=password=<github-pat>
kubectl label secret cluster-gitops-repo -n argocd \
argocd.argoproj.io/secret-type=repository
Alternative: SSH Key¶
For SSH-based authentication:
# Generate SSH key (if not already done)
ssh-keygen -t ed25519 -C "argocd@camaradesuk" -f ~/.ssh/argocd
# Add public key to GitHub organization or repository Deploy Keys
# Create credential template for SSH
kubectl create secret generic github-camaradesuk-ssh \
--namespace argocd \
--from-literal=type=git \
--from-literal=url=git@github.com:camaradesuk \
--from-file=sshPrivateKey=$HOME/.ssh/argocd
kubectl label secret github-camaradesuk-ssh -n argocd \
argocd.argoproj.io/secret-type=repo-creds
Verify repository access:
Step 4: Deploy Root Application (App-of-Apps Bootstrap)¶
Why kubectl here: Initial deployment of the app-of-apps pattern. This is the LAST time you'll use kubectl for Applications. After this, everything is managed via GitOps.
App-of-Apps Pattern¶
Instead of applying each Application individually, we use the App-of-Apps pattern:
- Apply one root Application (bootstrap/root.yaml)
- Root Application watches the apps/ directory
- ArgoCD automatically creates child Applications from YAML files in that directory
- Future Applications only need Git commits (no kubectl)
Apply the root Application:
cd /path/to/cluster-gitops
# This is the LAST kubectl apply for Applications!
kubectl apply -f bootstrap/root.yaml
# Verify root Application is created
kubectl get application root -n argocd
# Watch ArgoCD create child Applications (takes ~30 seconds)
kubectl get applications -n argocd -w
Expected output after ~30 seconds:
NAME SYNC STATUS HEALTH STATUS
argocd-config Synced Healthy
cert-manager Synced Healthy
external-dns Synced Healthy
ingress-nginx Synced Healthy
rabbitmq Synced Progressing
root Synced Healthy
syrf-api OutOfSync Missing
syrf-docs Synced Degraded
syrf-project-management OutOfSync Missing
syrf-quartz OutOfSync Missing
syrf-user-guide Synced Degraded
syrf-web Synced Degraded
How App-of-Apps Works¶
- Root Application syncs from
apps/directory in Git - For each
*.yamlfile (exceptpreview-*.yaml), ArgoCD creates a child Application - Changes to
apps/directory are automatically detected (every 3 minutes) - New Applications: Add YAML to
apps/and commit/push - Delete Applications: Remove YAML from
apps/and commit/push (auto-pruned)
No more kubectl apply! All Application management is now via Git commits.
Step 5: Verify GitOps is Working¶
At this point, ArgoCD is managing everything. Test the GitOps workflow:
# Make a change to a values file
cd /path/to/cluster-gitops
vim environments/staging/api.values.yaml
# (make a trivial change, like add a comment)
# Commit and push
git add environments/staging/api.values.yaml
git commit -m "test: verify GitOps sync"
git push
# Watch ArgoCD sync the change (within 3 minutes)
kubectl get application syrf-api -n argocd -w
Step 6: Access ArgoCD UI¶
The ArgoCD UI is accessible via the Ingress configured in the argocd-config Application.
URL: https://argocd.camarades.net
Get initial admin password:
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d && echo
Login:
- Username: admin
- Password: (from above command)
Change password immediately via UI: User Info → Update Password
Bootstrap Exceptions Summary¶
The following operations require kubectl/helm (not GitOps):
- ArgoCD installation (helm install) - One-time bootstrap
- Repository credentials (kubectl apply secret) - One-time bootstrap
- Root Application deployment (kubectl apply bootstrap/root.yaml) - One-time bootstrap
Everything else must be managed via GitOps (git commits → ArgoCD sync).
Important: App-of-Apps Pruning¶
The root Application has prune: true enabled, which means:
- ✅ Adding Applications: Create YAML in apps/, commit, push → ArgoCD auto-creates
- ✅ Deleting Applications: Remove YAML from apps/, commit, push → ArgoCD auto-deletes
- ✅ Works for all Applications: Even those created before the root Application
This ensures the cluster state always matches Git (true GitOps).
Post-Bootstrap: GitOps Workflow¶
After bootstrap, the workflow is:
- Make changes: Edit files in cluster-gitops or syrf repos
- Commit:
git add . && git commit -m "..." - Push:
git push - ArgoCD syncs automatically (within 3 minutes by default)
- Verify: Check ArgoCD UI or
kubectl get applications -n argocd
NEVER use kubectl apply or helm install after bootstrap. Use git commits.
Troubleshooting¶
Application shows "Unknown" sync status¶
Cause: ArgoCD cannot access the repository (authentication failed).
Fix: Check repository credentials:
Application shows "OutOfSync"¶
Cause: Cluster state differs from git state.
Fix: Let ArgoCD sync or trigger manually:
Certificate issues for ArgoCD Ingress¶
Cause: Let's Encrypt certificate not issued yet, or ACME challenge failed.
Fix: Check cert-manager logs and certificate status:
kubectl get certificate -n argocd
kubectl describe certificate argocd-server-tls -n argocd
kubectl logs -n cert-manager deployment/cert-manager
External-DNS in CrashLoopBackOff¶
Cause: External-DNS trying to delete DNS records from legacy Jenkins X cluster.
Symptom: Fatal error Precondition not met for 'entity.change.deletions[syrf.org.uk.][TXT]'
Fix: See External-DNS Troubleshooting Guide
Status: RESOLVED - Changed policy to upsert-only to prevent deletion attempts during migration period.
Current Cluster Status¶
Cluster: camaradesuk (europe-west2-a, camarades-net project)
Infrastructure Components (Managed by ArgoCD): - ✅ cert-manager (v1.15.0) - TLS certificate automation - ✅ ingress-nginx (4.11.1) - Ingress controller with LoadBalancer - ✅ external-dns (1.14.5) - DNS automation for syrf.org.uk - ✅ rabbitmq (14.6.6) - Message broker for microservices - ✅ argocd-config (self-managed) - ArgoCD Ingress and ClusterIssuers
SyRF Applications (Managed by ArgoCD): - syrf-api (staging namespace) - syrf-project-management (staging namespace) - syrf-quartz (staging namespace) - syrf-web (staging namespace) - syrf-user-guide (public docs) - syrf-docs (team docs with OAuth2 Proxy)
Workload Identity:
- external-dns service account: camaradesuk-external-dns@camarades-net.iam.gserviceaccount.com
- Configured via Terraform with DNS admin role
Related Documentation¶
- GitOps Architecture - Architecture overview
- Deploying Services - How to deploy SyRF services
- Promotion Workflow - Staging → Production promotion
- Cluster Setup Guide - GKE cluster provisioning
- Terraform Guide - Infrastructure as Code