Skip to content

Migration Runbook: Production Lambda to ACK

NOTE: This runbook covers Phase 4 (Production Cutover) of the ACK migration. It needs updating before use — see Technical Plan for the validated approach. Key changes: separate per-environment S3 buckets (e.g. syrfapp-uploads, syrfapp-uploads-staging), PostSync Job for credentials, corrected Lambda handler/runtime.

Overview

This runbook provides step-by-step instructions for migrating the production S3 Notifier Lambda from Terraform/CI-managed deployment to ACK (AWS Controllers for Kubernetes) GitOps management.

Risk Level: Medium - Production bucket contains user data Estimated Duration: 2-4 hours (with validation pauses) Rollback Time: 15 minutes


Pre-Migration Checklist

1. Prerequisites Verified

  • ACK S3 Controller installed and healthy in ack-system namespace
  • ACK Lambda Controller installed and healthy in ack-system namespace
  • Cross-cloud IAM (GKE → AWS) tested with staging resources
  • Staging deployment completed and validated
  • Helm chart tested with helm template locally

2. Access Confirmed

  • AWS Console access (eu-west-1)
  • GKE cluster access (kubectl configured for camaradesuk)
  • ArgoCD admin access
  • GitHub write access (for cluster-gitops)

3. Backup Completed

# Create inventory of production bucket
aws s3 ls s3://syrfapp-uploads --recursive > ~/production-bucket-inventory-$(date +%Y%m%d).txt

# Record current Lambda configuration
aws lambda get-function --function-name syrfAppUploadS3Notifier > ~/production-lambda-config-$(date +%Y%m%d).json

# Record current bucket notification config
aws s3api get-bucket-notification-configuration --bucket syrfapp-uploads > ~/production-notification-config-$(date +%Y%m%d).json

Migration Steps

Step 1: Freeze Current Deployment (5 min)

Purpose: Prevent conflicts during migration

  1. Disable the CI/CD Lambda deployment:
# In syrf repo, create a temporary branch to disable Lambda deployment
# Or: Communicate to team that Lambda deployments are frozen
  1. Verify no deployments in progress:
# Check GitHub Actions for running Lambda workflows
gh run list --workflow=ci-cd.yml --status=in_progress

Step 2: Verify Production State (10 min)

Purpose: Establish baseline for validation

  1. Record current Lambda version:
aws lambda get-function --function-name syrfAppUploadS3Notifier \
  --query 'Configuration.Version' --output text
  1. Test current functionality:
# Upload a test file
echo "migration-test-$(date +%s)" > /tmp/migration-test.txt
aws s3 cp /tmp/migration-test.txt s3://syrfapp-uploads/migration-test/

# Check Lambda was invoked (wait 30 seconds)
aws logs tail /aws/lambda/syrfAppUploadS3Notifier --since 2m | grep migration-test

# Clean up
aws s3 rm s3://syrfapp-uploads/migration-test/migration-test.txt
  1. Record file count for validation:
aws s3 ls s3://syrfapp-uploads --recursive --summarize | tail -2
# Note: Total Objects and Total Size

Step 3: Create Production Config in cluster-gitops (15 min)

Purpose: Prepare GitOps configuration with adoption flags

  1. Create production service config:
cd /path/to/cluster-gitops

mkdir -p syrf/environments/production/s3-notifier
  1. Create config.yaml:
# syrf/environments/production/s3-notifier/config.yaml
serviceName: s3-notifier
envName: production
chartTag: main  # Or specific commit SHA

lambda:
  version: "X.Y.Z"  # Current production version
  packageKey: "s3-notifier/X.Y.Z.zip"

gitVersion:
  sha: "abc123..."
  shortSha: "abc123"
  1. Create values.yaml with adoption flag:
# syrf/environments/production/s3-notifier/values.yaml
envName: production
namespace: syrf-production

bucket:
  name: syrfapp-uploads
  adopt: true  # CRITICAL: Adopt existing bucket
  versioning: true
  tags:
    Environment: production
    CriticalData: "true"

function:
  name: syrfAppUploadS3Notifier

env:
  rabbitmqHost: "amqp://rabbitmq.camarades.net:5672"
  extra:
    LOG_LEVEL: "Information"
  1. Commit but DO NOT push yet:
git add syrf/environments/production/s3-notifier/
git commit -m "feat(s3-notifier): add production config with adoption flag"

Step 4: Deploy to Production (20 min)

Purpose: Let ACK adopt existing resources

  1. Push the config:
git push origin main
  1. Monitor ArgoCD sync:
# Watch the Application appear and sync
argocd app list | grep s3-notifier
argocd app get production-s3-notifier
  1. Watch ACK controller logs during adoption:
kubectl logs -n ack-system -l app.kubernetes.io/name=ack-s3-controller -f &
kubectl logs -n ack-system -l app.kubernetes.io/name=ack-lambda-controller -f &
  1. Verify ACK resources created:
kubectl get bucket -n syrf-production
kubectl get function -n syrf-production

Step 5: Validate Adoption (15 min)

Purpose: Confirm no data loss or recreation

  1. CRITICAL: Verify bucket was NOT recreated:
# Check bucket creation date - should be original date, NOT today
aws s3api head-bucket --bucket syrfapp-uploads 2>&1 || true

# List bucket to confirm files exist
aws s3 ls s3://syrfapp-uploads --recursive --summarize | tail -2
# Compare with Step 2 - numbers should match
  1. Verify ACK shows adopted status:
kubectl describe bucket syrfapp-uploads -n syrf-production | grep -A5 Annotations
# Should show: services.k8s.aws/adopted: "true"
  1. Verify Lambda configuration unchanged:
aws lambda get-function --function-name syrfAppUploadS3Notifier \
  --query 'Configuration.[FunctionName,Runtime,Handler,MemorySize]'

Step 6: Test End-to-End (15 min)

Purpose: Confirm full functionality

  1. Upload test file:
echo "post-migration-test-$(date +%s)" > /tmp/post-migration-test.txt
aws s3 cp /tmp/post-migration-test.txt s3://syrfapp-uploads/migration-test/
  1. Verify Lambda invocation:
# Wait 30 seconds
sleep 30

aws logs tail /aws/lambda/syrfAppUploadS3Notifier --since 2m | grep post-migration-test
  1. Verify RabbitMQ message received:
# Check API/PM service logs for file notification
kubectl logs -n syrf-production -l app=api --since=5m | grep -i "file\|upload"
  1. Clean up test file:
aws s3 rm s3://syrfapp-uploads/migration-test/post-migration-test.txt

Step 7: Decommission Old Management (30 min)

Purpose: Remove Terraform Lambda management

  1. Comment out Lambda resources in Terraform:
# In camarades-infrastructure repo
# Comment out or remove Lambda-related resources
# Keep bucket management temporarily if using separate Terraform
  1. Archive old workflow:
# In syrf repo
git mv .github/workflows/pr-preview-lambda.yml .github/workflows/archived/
git commit -m "chore: archive pr-preview-lambda.yml - now managed by ACK"
  1. Update documentation:
  2. Update CLAUDE.md with new architecture
  3. Mark old docs as deprecated

Rollback Procedure

If issues occur during migration:

Immediate Rollback (ACK just deployed)

  1. Delete ACK resources (bucket retained due to deletion policy):
kubectl delete bucket syrfapp-uploads -n syrf-production
kubectl delete function syrfAppUploadS3Notifier -n syrf-production
  1. Verify bucket still exists:
aws s3 ls s3://syrfapp-uploads --summarize
  1. Re-enable Terraform/CI management

Post-Migration Rollback

  1. Revert cluster-gitops changes:
git revert HEAD
git push
  1. Manually restore notification configuration if needed:
aws s3api put-bucket-notification-configuration \
  --bucket syrfapp-uploads \
  --notification-configuration file://production-notification-config-YYYYMMDD.json

Post-Migration Monitoring

First 24 Hours

  • Monitor Lambda invocation metrics in CloudWatch
  • Check for errors in Lambda logs
  • Verify ArgoCD shows healthy sync status
  • Confirm file uploads working in SyRF application

First Week

  • Review ACK controller logs for any reconciliation issues
  • Confirm no drift between desired and actual state
  • Validate CI/CD promotion workflow works for new Lambda versions

Troubleshooting

ACK Tries to Recreate Bucket

Symptom: ACK creates new bucket instead of adopting

Fix:

# Manually add adoption annotation
kubectl annotate bucket syrfapp-uploads \
  services.k8s.aws/adopted=true \
  -n syrf-production --overwrite

Lambda Not Triggering After Migration

Symptom: S3 uploads don't invoke Lambda

Check:

# Verify notification configuration
aws s3api get-bucket-notification-configuration --bucket syrfapp-uploads

# Verify Lambda permission
aws lambda get-policy --function-name syrfAppUploadS3Notifier

Fix: May need to manually recreate notification/permission if ACK CRDs don't support them.

ArgoCD Shows OutOfSync

Symptom: ArgoCD keeps showing drift

Check:

argocd app diff production-s3-notifier

Fix: Usually indicates Helm values don't match actual AWS state. Update values to match.


Success Criteria

Migration is successful when:

  • Production bucket exists with all original files
  • Lambda responds to S3 uploads within 30 seconds
  • ArgoCD shows synced status
  • ACK resources show services.k8s.aws/adopted: "true"
  • No errors in ACK controller logs
  • File upload functionality works in SyRF application
  • Old Terraform/CI Lambda management disabled