Lambda GitOps Versioning - Architecture Design¶
SUPERSEDED: This document has been superseded by Lambda GitOps Integration. The analysis and options in this document informed the final consolidated decision. This document is preserved for historical reference only.
Problem Statement¶
Currently, there's an architectural asymmetry in how service versions are managed:
| Service Type | Source of Truth | Deployment Mechanism | Reconciliation |
|---|---|---|---|
| Kubernetes services | cluster-gitops/syrf/environments/{env}/{svc}/config.yaml |
ArgoCD | Continuous |
| Lambda (S3 Notifier) | CI/CD workflow builds and deploys directly | GitHub Actions + Terraform | One-shot |
Goal: Make cluster-gitops the single source of truth for Lambda versions, with deployment triggered by version changes in that repo.
Current Lambda Deployment Flow¶
┌──────────────────────────────────────────────────────────────────────────┐
│ CURRENT STATE (Direct Deploy) │
└──────────────────────────────────────────────────────────────────────────┘
1. Push to main branch (src/services/s3-notifier/)
│
▼
2. ci-cd.yml workflow triggers
│
├─► GitVersion calculates version (e.g., 0.1.5)
│
├─► dotnet publish → production.zip
│
├─► Upload to S3: lambda-packages/production.zip
│
├─► Terraform apply (camarades-infrastructure/terraform/lambda/)
│
├─► Create git tag: s3-notifier-v0.1.5
│
└─► Create GitHub Release with .zip attachment
No version declaration in cluster-gitops - version is computed and deployed in one step.
Current Deployment Mechanism Comparison¶
Production Lambda (ci-cd.yml)¶
Trigger: Push to main affecting src/services/s3-notifier/
Key Characteristics:
┌─────────────────────────────────────────────────────────────────────────────┐
│ PRODUCTION DEPLOYMENT FLOW │
└─────────────────────────────────────────────────────────────────────────────┘
1. Build & Package
└─► dotnet lambda package → production.zip
2. Upload to S3
└─► aws s3 cp production.zip s3://camarades-terraform-state-aws/lambda-packages/production.zip
(FIXED path - always overwrites)
3. Terraform Apply
├─► Variables:
│ - TF_VAR_production_commit_sha = "abc123..."
│ - TF_VAR_production_source_code_hash = base64(sha256(production.zip))
└─► Resources:
- aws_lambda_function.production (syrfAppUploadS3Notifier)
- S3 notification on prefix: Projects/
4. Concurrency Control
└─► concurrency group: production-lambda-terraform (serialized deploys)
Terraform Resources (single function, fixed config):
- Function: syrfAppUploadS3Notifier
- S3 Source: lambda-packages/production.zip
- S3 Trigger Prefix: Projects/
- Environment Variables: Production RabbitMQ host, etc.
Preview Lambdas (pr-preview-lambda.yml)¶
Trigger: PR labeled with preview, synchronize, or close events
Key Characteristics:
┌─────────────────────────────────────────────────────────────────────────────┐
│ PREVIEW DEPLOYMENT FLOW │
└─────────────────────────────────────────────────────────────────────────────┘
1. Collect Active PRs
└─► GitHub API: List all PRs with 'preview' label
Output: ["123", "456", "789"]
2. Build Per-PR Package
└─► dotnet lambda package → pr-{number}.zip
3. Upload Per-PR Package
└─► aws s3 cp pr-{number}.zip s3://camarades-terraform-state-aws/lambda-packages/pr-{number}.zip
(DYNAMIC path per PR)
4. Terraform Apply with for_each
├─► Variables:
│ - TF_VAR_preview_prs = ["123", "456", "789"]
│ - TF_VAR_preview_commit_shas = {"123": "abc...", "456": "def..."}
│ - TF_VAR_preview_versions = {"123": "0.1.5-pr123", "456": "0.1.5-pr456"}
└─► Resources (via for_each):
- aws_lambda_function.preview["123"] → syrfAppUploadS3Notifier-pr-123
- aws_lambda_function.preview["456"] → syrfAppUploadS3Notifier-pr-456
- S3 notifications per prefix: preview/pr-{n}/
5. Cleanup on PR Close
└─► Remove PR from preview_prs set → Terraform destroys Lambda
Terraform Pattern (for_each for dynamic resources):
# Preview Lambda functions - one per PR
resource "aws_lambda_function" "preview" {
for_each = var.preview_prs # set(string): ["123", "456"]
function_name = "syrfAppUploadS3Notifier-pr-${each.key}"
s3_bucket = var.lambda_bucket
s3_key = "lambda-packages/pr-${each.key}.zip"
# ...
}
# S3 notifications with dynamic block
resource "aws_s3_bucket_notification" "bucket_notification" {
# Production notification (always)
lambda_function {
lambda_function_arn = aws_lambda_function.production.arn
filter_prefix = "Projects/"
}
# Preview notifications (dynamic per PR)
dynamic "lambda_function" {
for_each = var.preview_prs
content {
lambda_function_arn = aws_lambda_function.preview[lambda_function.value].arn
filter_prefix = "preview/pr-${lambda_function.value}/"
}
}
}
Critical Finding: No Staging Lambda Exists¶
Current State Analysis:
| Environment | Lambda Function | S3 Prefix | Status |
|---|---|---|---|
| Production | syrfAppUploadS3Notifier |
Projects/ |
✅ Exists |
| Preview | syrfAppUploadS3Notifier-pr-{n} |
preview/pr-{n}/ |
✅ Exists (dynamic) |
| Staging | None | None | ❌ Does NOT exist |
Implication: Staging environment currently shares production Lambda, meaning: - Staging deployments don't test Lambda changes before production - No isolation between staging and production file processing - Adding staging Lambda is a prerequisite for GitOps versioning
Architectural Differences Summary¶
| Aspect | Production (ci-cd.yml) | Preview (pr-preview-lambda.yml) |
|---|---|---|
| Lifecycle | Permanent | Ephemeral (PR lifecycle) |
| Function Count | 1 | N (one per PR) |
| Terraform Pattern | Single resource | for_each over set |
| S3 Package Path | production.zip (fixed) |
pr-{n}.zip (dynamic) |
| Version Tracking | GitHub tag + Release | PR commit SHA |
| Cleanup | Never | Auto on PR close |
| Trigger | Push to main | PR label events |
Key Variables in Terraform¶
# Production
variable "production_commit_sha" {
description = "Git commit SHA for production deployment"
type = string
}
variable "production_source_code_hash" {
description = "Base64-encoded SHA256 hash of production.zip"
type = string
}
# Preview (for_each pattern)
variable "preview_prs" {
description = "Set of PR numbers with preview label"
type = set(string)
default = []
}
variable "preview_commit_shas" {
description = "Map of PR number to commit SHA"
type = map(string)
default = {}
}
variable "preview_versions" {
description = "Map of PR number to semantic version"
type = map(string)
default = {}
}
Desired State¶
┌──────────────────────────────────────────────────────────────────────────┐
│ DESIRED STATE (GitOps Versioning) │
└──────────────────────────────────────────────────────────────────────────┘
cluster-gitops/syrf/environments/
├── staging/
│ └── s3-notifier/
│ └── config.yaml ◄─── lambdaVersion: 0.1.5
│ (single source of truth)
└── production/
└── s3-notifier/
└── config.yaml ◄─── lambdaVersion: 0.1.4
(manually promoted)
Change to config.yaml → triggers → Lambda deployment
Design Options¶
Option 1: GitOps-Triggered Terraform (Recommended)¶
Architecture: cluster-gitops PR merge → GitHub Actions → Terraform apply
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPTION 1: GitOps-Triggered Terraform │
└─────────────────────────────────────────────────────────────────────────────┘
Phase A: Build & Release (unchanged from current)
─────────────────────────────────────────────────
1. Push to main (s3-notifier code changes)
2. ci-cd.yml workflow:
├─► GitVersion → 0.1.6
├─► Build Lambda package (.zip)
├─► Create git tag: s3-notifier-v0.1.6
└─► Create GitHub Release with .zip attachment
❌ Does NOT deploy to Lambda
❌ Does NOT upload to S3 lambda-packages/
✓ Only creates versioned artifact
Phase B: Promote to Staging (GitOps)
────────────────────────────────────
3. ci-cd.yml creates PR to cluster-gitops:
- Updates: syrf/environments/staging/s3-notifier/config.yaml
- Sets: lambdaVersion: "0.1.6"
4. PR auto-merges (staging)
5. cluster-gitops webhook → lambda-deploy.yml workflow
├─► Detect which environment changed (staging)
├─► Download .zip from GitHub Release (s3-notifier-v0.1.6)
├─► Upload to S3: lambda-packages/staging.zip
└─► Terraform apply with:
- TF_VAR_staging_version=0.1.6
- TF_VAR_staging_source_code_hash=<calculated>
Phase C: Promote to Production (Manual Gate)
────────────────────────────────────────────
6. After staging verification, trigger production promotion
7. Creates PR to cluster-gitops:
- Updates: syrf/environments/production/s3-notifier/config.yaml
- Sets: lambdaVersion: "0.1.6"
8. Manual review & merge
9. Same workflow as step 5, but for production environment
File Structure in cluster-gitops:
# syrf/environments/staging/s3-notifier/config.yaml
serviceName: s3-notifier
envName: staging
lambda:
version: "0.1.6"
functionName: "syrfAppUploadS3Notifier" # or staging-specific name
s3Prefix: "Projects/"
gitVersion:
sha: "abc123"
shortSha: "abc123"
Terraform Changes Required: - Refactor to support staging/production as separate Lambda functions (or same function with version alias) - Accept version and S3 package path as variables - Support Lambda aliases for blue/green deployment
Pros: - ✅ Leverages existing Terraform Lambda management - ✅ Uses pre-built packages from GitHub Releases - ✅ Consistent promotion PR pattern with K8s services - ✅ No new operators or controllers - ✅ Clear audit trail (version change → PR → deployment)
Cons: - ⚠️ Not continuous reconciliation (event-driven, not polling) - ⚠️ Cross-repo workflow triggers add complexity - ⚠️ Lambda drift won't be auto-corrected (manual intervention needed)
Option 2: AWS Controllers for Kubernetes (ACK)¶
Architecture: Lambda managed as Kubernetes CRD, reconciled by ACK operator
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPTION 2: AWS Controllers for Kubernetes │
└─────────────────────────────────────────────────────────────────────────────┘
1. Install ACK Lambda Controller in GKE cluster
- Requires IAM role for service account (IRSA) or GKE Workload Identity → AWS
- Controller watches for Lambda CRDs
2. Define Lambda as Kubernetes manifest in cluster-gitops:
# syrf/environments/staging/s3-notifier/lambda.yaml
apiVersion: lambda.services.k8s.aws/v1alpha1
kind: Function
metadata:
name: syrf-s3-notifier-staging
namespace: syrf-staging
spec:
name: syrfAppUploadS3Notifier-staging
runtime: dotnet8
handler: "SyRF.S3FileSavedNotifier.Endpoint::..."
code:
s3Bucket: camarades-terraform-state-aws
s3Key: lambda-packages/staging-0.1.6.zip # ← Version in filename
environment:
variables:
RabbitMqHost: "amqp://..."
memorySize: 512
timeout: 30
3. ArgoCD syncs Lambda CRD to cluster
4. ACK Lambda Controller reconciles:
- Creates/updates AWS Lambda function to match spec
- Continuous reconciliation (drift correction)
- Status reflected back to K8s resource
5. S3 bucket notifications still managed separately (Terraform or ACK S3 controller)
Pros: - ✅ True GitOps with continuous reconciliation - ✅ Automatic drift correction - ✅ Consistent with K8s patterns (everything is a CRD) - ✅ ArgoCD manages Lambda just like other resources
Cons: - ⚠️ Requires ACK Lambda controller installation and maintenance - ⚠️ Complex cross-cloud IAM (GKE → AWS) - ⚠️ Different from current Terraform approach (migration effort) - ⚠️ S3 bucket notifications need separate management - ⚠️ ACK is AWS-specific; adds AWS dependency to K8s cluster
Option 3: Flux Terraform Controller¶
Architecture: Terraform execution managed as Kubernetes resource
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPTION 3: Flux Terraform Controller │
└─────────────────────────────────────────────────────────────────────────────┘
1. Install Flux Terraform Controller (tf-controller)
- Runs Terraform inside K8s pods
- Manages Terraform state
2. Define Terraform resource in cluster-gitops:
# syrf/environments/staging/s3-notifier/terraform.yaml
apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
name: s3-notifier-staging
namespace: flux-system
spec:
approvePlan: auto
interval: 10m
path: ./terraform/lambda
sourceRef:
kind: GitRepository
name: camarades-infrastructure
vars:
- name: environment
value: staging
- name: lambda_version
value: "0.1.6" # ← Version declared here
varsFrom:
- kind: Secret
name: lambda-terraform-vars
3. ArgoCD (or Flux) syncs Terraform CRD
4. Terraform Controller:
- Clones camarades-infrastructure
- Runs terraform plan/apply
- Reconciles on interval (10m)
Pros: - ✅ Kubernetes-native Terraform execution - ✅ Drift detection via interval reconciliation - ✅ Keeps Terraform as deployment mechanism - ✅ Secrets management via K8s secrets
Cons: - ⚠️ Another controller to install and maintain - ⚠️ Terraform runs in cluster (security implications) - ⚠️ Mixing GitOps tools (ArgoCD + Flux component) - ⚠️ More complex than GitHub Actions approach
Option 4: Lambda Versioning with Aliases¶
Architecture: Single Lambda function with version aliases, managed via GitOps
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPTION 4: Lambda Aliases (Traffic Shifting) │
└─────────────────────────────────────────────────────────────────────────────┘
Instead of separate staging/production functions, use Lambda aliases:
Lambda Function: syrfAppUploadS3Notifier
├── $LATEST (always latest deployed code)
├── Version 42 (published version for v0.1.5)
├── Version 43 (published version for v0.1.6)
├── Alias: staging → points to Version 43
└── Alias: production → points to Version 42
cluster-gitops declares which version each alias points to:
# syrf/environments/staging/s3-notifier/config.yaml
lambda:
alias: staging
version: 43 # Lambda published version number
S3 bucket notifications route by prefix:
- Projects/staging/* → staging alias
- Projects/* → production alias
Pros: - ✅ Single function with multiple "environments" - ✅ Instant rollback (just change alias) - ✅ Traffic shifting possible (gradual rollout) - ✅ AWS-native blue/green deployment
Cons: - ⚠️ S3 notifications routing becomes complex - ⚠️ Current architecture assumes separate prefixes per PR, not aliases - ⚠️ Requires rethinking S3 trigger architecture
Recommendation: Option 1 (GitOps-Triggered Terraform)¶
Rationale:
- Minimal disruption: Uses existing Terraform, GitHub Releases, S3 backend
- Familiar patterns: Matches current K8s promotion workflow
- No new operators: Avoids ACK/Flux complexity
- Sufficient for use case: Lambda versions change infrequently (releases, not continuous)
- Clear separation of concerns:
- cluster-gitops: Declares desired version
- GitHub Actions: Orchestrates deployment
- Terraform: Manages Lambda infrastructure
- GitHub Releases: Stores versioned artifacts
Key Insight: Lambda doesn't need continuous reconciliation like Kubernetes pods. Version changes are discrete events (releases), not continuous drift. Event-driven deployment (PR merge → workflow) is appropriate.
Implementation Plan for Option 1¶
Phase 1: Refactor CI/CD to Separate Build from Deploy¶
Current: ci-cd.yml builds AND deploys Lambda in one workflow
Target: Build creates artifact only; deploy triggered by cluster-gitops change
Changes to ci-cd.yml: 1. Keep: GitVersion, build, create tag, create GitHub Release 2. Remove: Upload to S3, Terraform apply 3. Add: Update cluster-gitops staging config with new version
Phase 2: Create Lambda Deploy Workflow in cluster-gitops¶
New workflow: cluster-gitops/.github/workflows/lambda-deploy.yml
Triggers on:
- Push to main affecting syrf/environments/*/s3-notifier/config.yaml
Steps: 1. Determine which environment(s) changed 2. Read version from config.yaml 3. Download .zip from GitHub Release (s3-notifier-v{version}) 4. Upload to S3 (environment-specific path) 5. Checkout camarades-infrastructure 6. Run Terraform with environment and version variables
Phase 3: Refactor Terraform for Multi-Environment¶
Current: Single production Lambda, preview Lambdas dynamic Target: Staging + Production as separate functions (or aliases)
Options:
- A) Separate functions: syrfAppUploadS3Notifier-staging, syrfAppUploadS3Notifier-production
- B) Single function with aliases: staging and production aliases pointing to versions
Recommend Option A initially for simplicity, can migrate to aliases later.
Phase 4: Add config.yaml for Lambda in cluster-gitops¶
Create files:
syrf/environments/staging/s3-notifier/config.yaml
syrf/environments/production/s3-notifier/config.yaml
Schema:
serviceName: s3-notifier
envName: staging
lambda:
version: "0.1.6"
functionName: "syrfAppUploadS3Notifier-staging"
s3TriggerPrefix: "Projects/"
gitVersion:
sha: "abc123def456..."
shortSha: "abc123"
deploymentNotification:
commitSha: "abc123def456..."
Design Decisions (User Input)¶
| Question | Decision | Rationale |
|---|---|---|
| Environment Isolation | Separate functions | syrfAppUploadS3Notifier-staging and syrfAppUploadS3Notifier (production) |
| AWS Accounts | Same account | Simpler credential management |
| Preview Lambdas | Full GitOps alignment | Match K8s preview pattern for consistency |
Preview Lambda Strategy (Revised)¶
After analyzing K8s preview patterns, preview Lambdas should follow the same GitOps pattern as K8s services:
┌─────────────────────────────────────────────────────────────────────────────┐
│ PREVIEW LAMBDA GITOPS PATTERN │
│ │
│ 1. pr-preview-lambda.yml builds Lambda package │
│ │
│ 2. pr-preview-lambda.yml writes config to cluster-gitops: │
│ syrf/environments/preview/pr-{n}/services/s3-notifier.values.yaml │
│ │
│ 3. cluster-gitops lambda-deploy.yml triggers on file change │
│ │
│ 4. Terraform applies with preview_prs set derived from files │
│ │
│ 5. Cleanup: delete pr-{n}/ folder → triggers Lambda destruction │
└─────────────────────────────────────────────────────────────────────────────┘
Why this changed: K8s previews use the exact same pattern (workflow creates files → ArgoCD deploys). Lambda should be no different for architectural consistency. See "Kubernetes Preview Environment Pattern" and "Revised Recommendation" sections for detailed analysis.
Kubernetes Preview Environment Pattern (For Comparison)¶
Understanding how K8s previews work is essential for deciding Lambda preview strategy.
K8s Preview Deployment Architecture¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ K8S PREVIEW DEPLOYMENT FLOW │
└─────────────────────────────────────────────────────────────────────────────┘
1. PR labeled with "preview"
│
▼
2. pr-preview.yml workflow triggers
│
├─► Build Docker images with pr-{number} tag
│
└─► write-versions job creates files in cluster-gitops:
│
└─► syrf/environments/preview/pr-{N}/
├── pr.yaml ◄─── ApplicationSet trigger
├── namespace.yaml ◄─── K8s Namespace definition
├── mongodb-user.yaml ◄─── Database isolation
├── db-reset-job.yaml ◄─── PreSync data cleanup
└── services/
├── api.values.yaml ◄─── Service image tags
├── project-management.values.yaml
├── quartz.values.yaml
└── web.values.yaml
│
▼
3. ArgoCD ApplicationSet detects pr.yaml
│
├─► Uses matrix generator: PR × services
│
└─► Generates Applications:
├── pr-{N}-namespace
├── pr-{N}-api
├── pr-{N}-project-management
├── pr-{N}-quartz
└── pr-{N}-web
│
▼
4. ArgoCD syncs Applications (auto-sync enabled)
│
└─► Deploys via Helm charts at PR commit SHA
│
▼
5. Cleanup on PR close
│
└─► workflow deletes pr-{N}/ folder → ArgoCD deletes Apps → resources cleaned up
Key K8s Preview Characteristics¶
| Aspect | K8s Implementation |
|---|---|
| Trigger | pr-preview.yml creates files in cluster-gitops |
| Orchestration | ArgoCD ApplicationSet (declarative) |
| Config Location | cluster-gitops/syrf/environments/preview/pr-{N}/ |
| Deployment | ArgoCD syncs Helm charts from monorepo |
| Namespace Isolation | Per-PR K8s namespace |
| Database Isolation | Per-PR MongoDB database + AtlasDatabaseUser |
| Version Tracking | GitVersion in {service}.values.yaml |
| Cleanup | Delete pr-{N}/ folder → cascading deletion |
ApplicationSet Configuration¶
# argocd/applicationsets/syrf-previews.yaml (simplified)
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
spec:
generators:
- matrix:
generators:
# Trigger: pr-*/pr.yaml existence
- git:
files:
- path: "syrf/environments/preview/pr-*/pr.yaml"
# Services to deploy per PR
- merge:
generators:
- git:
files:
- path: "syrf/services/*/config.yaml"
- git:
files:
- path: "syrf/environments/preview/services/*/config.yaml"
Values File Hierarchy (Priority Order)¶
1. syrf/global.values.yaml # Universal defaults
2. syrf/services/{svc}/values.yaml # Service base
3. syrf/environments/preview/preview.values.yaml # Preview defaults (ALL)
4. syrf/environments/preview/services/{svc}/values.yaml # Preview per-service
5. syrf/environments/preview/pr-{N}/services/{svc}.values.yaml # PR-SPECIFIC (HIGHEST)
Critical Insight: K8s Previews ARE GitOps¶
The K8s preview pattern is fully GitOps-driven: 1. Workflow creates config files in cluster-gitops 2. ArgoCD detects file changes via Git generators 3. ApplicationSet generates Applications automatically 4. ArgoCD handles deployment, sync, and cleanup
The workflow triggers GitOps, it doesn't deploy directly.
Lambda vs K8s Preview Pattern Comparison¶
| Aspect | K8s Previews | Lambda Previews (Current) |
|---|---|---|
| Config in cluster-gitops | ✅ Yes - pr.yaml, values files | ❌ No - managed in workflow |
| Deployment orchestrator | ArgoCD ApplicationSet | GitHub Actions + Terraform |
| Trigger mechanism | File existence → ArgoCD | Workflow → Terraform directly |
| State visibility | Full visibility in cluster-gitops | No visibility (ephemeral) |
| Cleanup mechanism | Delete folder → ArgoCD cascades | Workflow removes from TF set |
| Pattern consistency | GitOps-first | Imperative-first |
The Inconsistency Problem¶
Currently, Lambda previews operate differently from K8s previews:
┌─────────────────────────────────────────────────────────────────────────────┐
│ CURRENT INCONSISTENCY │
└─────────────────────────────────────────────────────────────────────────────┘
K8s Preview:
pr-preview.yml → creates files in cluster-gitops → ArgoCD deploys
↑ GitOps is the deployment mechanism
Lambda Preview:
pr-preview-lambda.yml → Terraform apply directly → Lambda deployed
↑ Workflow is the deployment mechanism
(cluster-gitops not involved)
Revised Analysis: Should Lambda Previews Follow K8s Pattern?¶
Option A: Full GitOps Alignment (K8s Pattern)¶
Make Lambda previews follow the same pattern as K8s:
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPTION A: FULL GITOPS ALIGNMENT │
└─────────────────────────────────────────────────────────────────────────────┘
1. pr-preview-lambda.yml creates:
cluster-gitops/syrf/environments/preview/pr-{N}/services/s3-notifier.values.yaml
2. cluster-gitops workflow (lambda-deploy.yml) triggers on file change
3. Workflow runs Terraform with values from config file
4. Cleanup: delete s3-notifier.values.yaml → triggers Lambda destruction
Pros: - ✅ Consistent pattern with K8s previews - ✅ Full visibility in cluster-gitops - ✅ Lambda preview state matches K8s preview state - ✅ Single source of truth for ALL preview resources
Cons: - ⚠️ Adds ~3-5 min latency (cross-repo workflow) - ⚠️ More complex coordination (two workflows) - ⚠️ Still imperative deployment (Terraform), just triggered differently
Option B: Hybrid Pattern (Current + Visibility)¶
Keep imperative deployment but add cluster-gitops visibility:
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPTION B: HYBRID (VISIBILITY ONLY) │
└─────────────────────────────────────────────────────────────────────────────┘
1. pr-preview-lambda.yml deploys Lambda directly (current behavior)
2. AFTER successful deploy, update cluster-gitops:
cluster-gitops/syrf/environments/preview/pr-{N}/services/s3-notifier.values.yaml
3. cluster-gitops is READ-ONLY for Lambda previews (not a trigger)
4. Cleanup: workflow deletes Lambda AND updates cluster-gitops
Pros: - ✅ Fast deployment (no cross-repo wait) - ✅ Visibility in cluster-gitops - ✅ Simple failure recovery (single workflow)
Cons: - ⚠️ Inconsistent pattern (K8s: GitOps-triggers-deploy, Lambda: deploy-then-update) - ⚠️ cluster-gitops may drift if workflow fails after deploy - ⚠️ Two sources of truth (workflow state + cluster-gitops)
Option C: No Change (Status Quo)¶
Keep Lambda previews completely separate:
Pros: - ✅ Simplest approach - ✅ Already working
Cons: - ❌ No visibility into Lambda preview state - ❌ Inconsistent with K8s pattern - ❌ Can't see complete preview environment in cluster-gitops
Recommendation: Option A (Full GitOps Alignment)¶
After deeper analysis, Option A is recommended for Lambda previews because:
1. Consistency Principle¶
K8s previews already accept the "workflow creates files → GitOps deploys" pattern. Lambda should follow the same pattern for architectural consistency.
2. The Latency Concern is Overstated¶
K8s Preview Latency Analysis:
pr-preview.yml workflow:
├── detect-changes: ~1 min
├── version-*: ~2 min
├── build-and-push: ~5-10 min
├── write-versions: ~1 min (creates cluster-gitops files)
└── ArgoCD sync: ~2-3 min
Total: ~12-17 minutes
Lambda with GitOps Latency Analysis:
pr-preview-lambda.yml workflow:
├── detect-changes: ~1 min
├── version: ~1 min
├── build Lambda package: ~2 min
├── write config to cluster-gitops: ~1 min
└── cluster-gitops workflow triggers: ~2 min
└── Terraform apply: ~3-5 min
Total: ~10-12 minutes
Conclusion: Lambda with GitOps is actually FASTER than K8s previews because Lambda builds are faster than Docker builds. The latency concern from earlier analysis was comparing against an idealized "immediate deploy" that K8s also doesn't achieve.
3. Single Source of Truth¶
With Option A, cluster-gitops/syrf/environments/preview/pr-{N}/ contains EVERYTHING about a preview:
- K8s namespace configuration
- MongoDB user configuration
- ALL service image tags
- Lambda version and configuration
This enables: - Complete preview state at a glance - Easier debugging (one place to look) - Consistent cleanup (delete folder = delete everything)
4. Terraform Pattern Adaptation¶
The for_each concern can be addressed:
# Current: Workflow passes set
variable "preview_prs" {
type = set(string)
default = []
}
# GitOps: Workflow discovers set from cluster-gitops files
# In lambda-deploy.yml workflow:
PREVIEW_PRS=$(find syrf/environments/preview/pr-*/services/s3-notifier.values.yaml \
-exec dirname {} \; | xargs -I{} basename {} | sed 's/pr-//')
# Pass to Terraform:
TF_VAR_preview_prs='["123", "456"]'
Considerations: GitOps for Preview Lambdas¶
This section documents the trade-offs of applying GitOps to preview Lambdas. After analyzing the K8s preview pattern, Option A (Full GitOps Alignment) is now recommended, but these considerations should inform implementation decisions.
1. Feedback Loop Latency¶
Current Preview Flow (~5-8 minutes):
Push to PR → Build Lambda → Upload to S3 → Terraform apply → Lambda ready
└────────────── Single workflow, linear execution ──────────────┘
Hypothetical GitOps Flow (~15-25 minutes):
Push to PR → Build Lambda → Create PR to cluster-gitops → Wait for merge
│
▼
cluster-gitops PR merged
│
▼
lambda-deploy.yml triggers
│
▼
Download artifact → Terraform apply → Lambda ready
Impact: Developers expect rapid feedback on PR changes. The additional cross-repo coordination adds 10-15 minutes to each iteration cycle, significantly degrading developer experience.
2. Git History Pollution¶
Problem: Preview environments are high-churn by nature.
cluster-gitops commit history (hypothetical GitOps previews):
─────────────────────────────────────────────────────────────
abc1234 - Update pr-456 s3-notifier to sha def789
bcd2345 - Update pr-123 s3-notifier to sha abc012
cde3456 - Update pr-456 s3-notifier to sha ghi345
def4567 - Remove pr-789 s3-notifier (PR closed)
efg5678 - Update pr-123 s3-notifier to sha jkl678
fgh6789 - Add pr-901 s3-notifier config
...
(dozens of commits per day for active development)
Impact: - Meaningful staging/production changes buried in preview noise - Git blame becomes useless for understanding intentional changes - Repository size grows rapidly with ephemeral config churn
3. Cross-Repository Coordination Complexity¶
Current Architecture (self-contained):
┌─────────────────────────────────────────────────────────────┐
│ pr-preview-lambda.yml │
│ │
│ 1. Get list of preview PRs │
│ 2. Build Lambda for current PR │
│ 3. Upload to S3 │
│ 4. Terraform apply with preview_prs set │
│ 5. On PR close: remove from set, Terraform destroys │
│ │
│ └─► All state managed within single workflow │
└─────────────────────────────────────────────────────────────┘
GitOps Architecture (distributed state):
┌─────────────────────────────────────────────────────────────┐
│ syrf repo cluster-gitops repo │
│ ────────── ─────────────────── │
│ pr-preview.yml ──creates PR──► preview/pr-{n}/config.yaml │
│ │ │ │
│ │ ▼ │
│ │ lambda-deploy.yml │
│ │ │ │
│ │ ▼ │
│ │ Terraform apply │
│ │ │
│ On PR close: │
│ cleanup.yml ──creates PR──► Remove preview/pr-{n}/ │
│ │ │ │
│ │ ▼ │
│ │ lambda-deploy.yml │
│ │ │ │
│ │ ▼ │
│ └──────────────────── Terraform destroy │
│ │
│ Failure modes: │
│ - cluster-gitops PR fails to merge → orphaned state │
│ - cleanup PR fails → orphaned Lambda resources │
│ - Race conditions between multiple PRs │
│ - Partial failures (K8s deployed, Lambda failed) │
└─────────────────────────────────────────────────────────────┘
Impact: Distributed state across repositories creates multiple failure modes that don't exist with the current self-contained approach.
4. Terraform for_each Pattern Incompatibility¶
Current Pattern (works well):
# Workflow passes complete set of active PRs
variable "preview_prs" {
type = set(string)
default = [] # e.g., ["123", "456", "789"]
}
resource "aws_lambda_function" "preview" {
for_each = var.preview_prs
# Terraform manages full lifecycle based on set membership
}
GitOps Challenge:
# How would this work?
Option A: Aggregate configs at deploy time
─────────────────────────────────────────
cluster-gitops/
└── syrf/environments/preview/
├── pr-123/s3-notifier/config.yaml
├── pr-456/s3-notifier/config.yaml
└── pr-789/s3-notifier/config.yaml
lambda-deploy.yml would need to:
1. List all pr-* directories
2. Build preview_prs set from directory names
3. Pass to Terraform
Problem: What if a PR closes while workflow is running?
Directory deleted mid-execution → race condition
Option B: Separate Terraform per preview
──────────────────────────────────────────
Each preview has isolated Terraform state
Problems:
- Expensive (separate state file per PR)
- S3 bucket notifications can't be split (single resource)
- Lose the elegance of for_each pattern
Impact: The current for_each pattern is designed for workflow-managed state, not file-based discovery. Retrofitting GitOps would require significant Terraform refactoring.
5. Ephemeral Resources Don't Need GitOps Benefits¶
| GitOps Benefit | Value for Staging/Prod | Value for Previews |
|---|---|---|
| Single source of truth | ✅ Critical - must know what's deployed | ❌ PR commit IS the source of truth |
| Audit trail | ✅ Critical - compliance, debugging | ❌ Previews are throwaway |
| Manual gates | ✅ Critical - prod approval | ❌ No approval needed for preview |
| Rollback via git revert | ✅ Useful - revert prod issues | ❌ Just push new commit to PR |
| Drift detection | ✅ Useful - ensure consistency | ❌ Preview will be deleted anyway |
Conclusion: GitOps overhead provides no meaningful benefit for ephemeral preview resources.
6. Circular Dependency Problem¶
┌─────────────────────────────────────────────────────────────┐
│ CHICKEN-AND-EGG PROBLEM │
└─────────────────────────────────────────────────────────────┘
Q: When should cluster-gitops preview config be created?
Option A: Before PR exists
─────────────────────────
Can't create config without PR number
PR number assigned by GitHub when PR is created
→ Impossible
Option B: After PR created, before first build
─────────────────────────────────────────────
PR created → workflow creates cluster-gitops config → triggers deploy
│
└─► But deploy needs the artifact
Artifact created by same workflow
→ Circular dependency
Option C: After artifact built
──────────────────────────────
PR created → build artifact → create cluster-gitops PR → deploy
│
└─► Adds latency (Option B problem)
Every push updates cluster-gitops
→ Git pollution problem
Impact: There's no clean way to bootstrap preview configs without introducing either latency or complexity.
7. Failure Recovery Complexity¶
Current Approach (idempotent):
# If preview deployment fails, just re-run the workflow
# Workflow has complete state, can retry everything
gh workflow run pr-preview-lambda.yml
GitOps Approach (distributed state):
# If deployment fails, need to diagnose where:
# 1. Did syrf workflow fail to create cluster-gitops PR?
# 2. Did cluster-gitops PR fail to merge?
# 3. Did lambda-deploy.yml fail to trigger?
# 4. Did Terraform fail?
# Recovery requires:
# - Check cluster-gitops for pending PRs
# - Check if config exists but Lambda doesn't
# - Manually reconcile state between repos
Impact: Debugging and recovery become significantly more complex with distributed state.
Advanced Option: Unified Preview Workflow (Lambda + K8s)¶
Concept: Lambda as "Just Another Service" in pr-preview.yml¶
Instead of having separate workflows (pr-preview.yml for K8s, pr-preview-lambda.yml for Lambda), Lambda could be integrated into the same pr-preview.yml workflow as K8s services.
┌─────────────────────────────────────────────────────────────────────────────┐
│ UNIFIED PREVIEW WORKFLOW (pr-preview.yml) │
│ │
│ Current (Separate): │
│ ─────────────────── │
│ pr-preview.yml ──────► K8s services (Docker build → cluster-gitops → ArgoCD)
│ pr-preview-lambda.yml ► Lambda (dotnet build → Terraform directly) │
│ │
│ Proposed (Unified): │
│ ────────────────────── │
│ pr-preview.yml ──────► ALL services: │
│ ├── K8s services (existing) │
│ │ └── writes: pr-{n}/services/{svc}.values.yaml │
│ │ │
│ └── Lambda (new) │
│ ├── Build Lambda package (.zip) │
│ ├── Upload to S3: lambda-packages/pr-{n}.zip │
│ └── writes: pr-{n}/services/s3-notifier.values.yaml
│ │
│ ArgoCD ApplicationSet generates Application for Lambda (like K8s services)│
│ Lambda Application syncs Terraform Job or ACK resource │
└─────────────────────────────────────────────────────────────────────────────┘
Benefits:¶
- Single Workflow: One pr-preview.yml handles ALL preview resources
- Consistent Pattern: Lambda config written to cluster-gitops same as K8s
- Unified Timing: Lambda and K8s services deploy in coordinated sync waves
- Single Cleanup: Delete pr-{n}/ folder cleans up EVERYTHING
Implementation in pr-preview.yml:¶
jobs:
# Existing K8s jobs...
build-and-push-images:
# ... existing Docker builds
# NEW: Add Lambda build alongside K8s builds
build-lambda:
if: needs.detect-changes.outputs.s3_notifier_changed == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: '8.0.x'
- name: Install Lambda Tools
run: dotnet tool install -g Amazon.Lambda.Tools
- name: Build Lambda Package
run: |
cd src/services/s3-notifier/SyRF.S3FileSavedNotifier.Endpoint
dotnet lambda package -o pr-${{ github.event.pull_request.number }}.zip
- name: Upload to S3
run: |
aws s3 cp pr-${{ github.event.pull_request.number }}.zip \
s3://camarades-terraform-state-aws/lambda-packages/pr-${{ github.event.pull_request.number }}.zip
# MODIFIED: write-versions now includes Lambda
write-versions:
runs-on: ubuntu-latest
steps:
# ... existing K8s service values ...
- name: Write Lambda values
if: needs.build-lambda.result == 'success' || needs.retag-lambda.result == 'success'
run: |
cat > syrf/environments/preview/pr-${{ env.PR_NUMBER }}/services/s3-notifier.values.yaml <<EOF
serviceName: s3-notifier
deploymentType: lambda
lambda:
version: "${{ needs.version-s3-notifier.outputs.version }}"
commitSha: "${{ needs.version-s3-notifier.outputs.sha }}"
s3Key: "lambda-packages/pr-${{ env.PR_NUMBER }}.zip"
gitVersion:
sha: "${{ needs.version-s3-notifier.outputs.sha }}"
shortSha: "${{ needs.version-s3-notifier.outputs.shortSha }}"
EOF
Advanced Option: ArgoCD-Orchestrated Lambda Deployment¶
Concept: ArgoCD as Unified Orchestrator for K8s AND Lambda¶
Instead of GitHub Actions triggering Terraform directly, ArgoCD could coordinate Lambda deployment using one of these patterns:
Option A: ACK (AWS Controllers for Kubernetes)¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPTION A: ACK LAMBDA CONTROLLER │
│ │
│ cluster-gitops/syrf/environments/preview/pr-{n}/lambda/function.yaml │
│ ────────────────────────────────────────────────────────────────────────── │
│ apiVersion: lambda.services.k8s.aws/v1alpha1 │
│ kind: Function │
│ metadata: │
│ name: s3-notifier-pr-123 │
│ namespace: pr-123 │
│ spec: │
│ name: syrfAppUploadS3Notifier-pr-123 │
│ runtime: dotnet8 │
│ handler: "SyRF.S3FileSavedNotifier..." │
│ code: │
│ s3Bucket: camarades-terraform-state-aws │
│ s3Key: lambda-packages/pr-123.zip │
│ │
│ Flow: │
│ pr-preview.yml → writes function.yaml → ArgoCD syncs → ACK creates Lambda │
│ │
│ Pros: │
│ ✓ True GitOps (ArgoCD is orchestrator) │
│ ✓ Continuous reconciliation (drift correction) │
│ ✓ Lambda is just another K8s resource │
│ │
│ Cons: │
│ ⚠️ Requires ACK Lambda Controller installation │
│ ⚠️ Cross-cloud IAM (GKE Workload Identity → AWS IAM) │
│ ⚠️ S3 bucket notifications need separate controller │
│ ⚠️ Migration from Terraform │
└─────────────────────────────────────────────────────────────────────────────┘
Option B: ArgoCD Sync Hook + Terraform Job¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPTION B: ARGOCD SYNC HOOK + TERRAFORM JOB │
│ │
│ cluster-gitops/syrf/environments/preview/pr-{n}/lambda/deploy-job.yaml │
│ ────────────────────────────────────────────────────────────────────────── │
│ apiVersion: batch/v1 │
│ kind: Job │
│ metadata: │
│ name: deploy-lambda-pr-123 │
│ annotations: │
│ argocd.argoproj.io/hook: Sync │
│ argocd.argoproj.io/sync-wave: "5" # After K8s services │
│ spec: │
│ template: │
│ spec: │
│ serviceAccountName: terraform-runner │
│ containers: │
│ - name: terraform │
│ image: hashicorp/terraform:1.6 │
│ env: │
│ - name: PR_NUMBER │
│ value: "123" │
│ - name: AWS_REGION │
│ value: eu-west-1 │
│ command: ["/bin/sh", "-c"] │
│ args: │
│ - | │
│ git clone https://github.com/camaradesuk/camarades-infrastructure
│ cd camarades-infrastructure/terraform/lambda │
│ terraform init │
│ terraform apply -var="preview_prs=[\"$PR_NUMBER\"]" -auto-approve
│ │
│ Flow: │
│ pr-preview.yml → writes deploy-job.yaml → ArgoCD syncs → Job runs Terraform│
│ │
│ Sync Wave Ordering: │
│ Wave -10: ExternalSecret (Atlas API credentials) │
│ Wave -5: AtlasDatabaseUser (creates MongoDB user) │
│ Wave -1: db-reset Job (drops collections) │
│ Wave 0: K8s Deployments (API, PM, Web, etc.) │
│ Wave 5: Lambda deploy Job (Terraform) ← NEW │
│ │
│ Pros: │
│ ✓ ArgoCD orchestrates ALL resources │
│ ✓ Uses existing Terraform │
│ ✓ Sync waves ensure correct ordering │
│ ✓ No additional controllers │
│ │
│ Cons: │
│ ⚠️ Terraform runs as K8s Job (state management complexity) │
│ ⚠️ AWS credentials in K8s secrets │
│ ⚠️ Job cleanup and retry handling │
└─────────────────────────────────────────────────────────────────────────────┘
Option C: Flamingo (ArgoCD + Flux Terraform Controller)¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ OPTION C: FLAMINGO / FLUX TF CONTROLLER │
│ │
│ cluster-gitops/syrf/environments/preview/pr-{n}/lambda/terraform.yaml │
│ ────────────────────────────────────────────────────────────────────────── │
│ apiVersion: infra.contrib.fluxcd.io/v1alpha2 │
│ kind: Terraform │
│ metadata: │
│ name: s3-notifier-pr-123 │
│ spec: │
│ approvePlan: auto │
│ interval: 5m │
│ path: ./terraform/lambda │
│ sourceRef: │
│ kind: GitRepository │
│ name: camarades-infrastructure │
│ vars: │
│ - name: preview_prs │
│ value: '["123"]' │
│ │
│ Flow: │
│ pr-preview.yml → writes terraform.yaml → ArgoCD syncs → Flux TF applies │
│ │
│ Pros: │
│ ✓ ArgoCD orchestrates │
│ ✓ Proper Terraform state management │
│ ✓ Continuous reconciliation │
│ │
│ Cons: │
│ ⚠️ Requires Flux Terraform Controller installation │
│ ⚠️ Mixing GitOps tools (ArgoCD + Flux) │
│ ⚠️ More infrastructure complexity │
└─────────────────────────────────────────────────────────────────────────────┘
Comparison: Lambda Deployment Approaches¶
| Approach | Orchestrator | Lambda Deploy | State Management | Complexity | Recommendation |
|---|---|---|---|---|---|
| Current | GitHub Actions | Terraform (workflow) | Terraform state | Low | Status quo |
| GitOps Workflow Trigger | GitHub Actions | Terraform (workflow) | Terraform state | Medium | ✅ Practical |
| ACK Controller | ArgoCD | ACK (K8s CRD) | ACK Controller | High | Future option |
| Sync Hook + TF Job | ArgoCD | Terraform (K8s Job) | Terraform state | Medium-High | ✅ Best unification |
| Flamingo/Flux TF | ArgoCD | Flux TF Controller | Flux TF | High | Over-engineered |
Recommended Evolution Path:¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ RECOMMENDED EVOLUTION PATH │
│ │
│ Phase 1 (Immediate): GitOps Workflow Trigger │
│ ────────────────────────────────────────────── │
│ - Integrate Lambda into pr-preview.yml │
│ - Write s3-notifier.values.yaml to cluster-gitops │
│ - cluster-gitops workflow triggers Terraform │
│ - Minimal changes to existing infrastructure │
│ │
│ Phase 2 (Future): ArgoCD Sync Hook Integration │
│ ──────────────────────────────────────────────── │
│ - Replace workflow trigger with ArgoCD sync hook │
│ - Terraform runs as K8s Job coordinated by ArgoCD │
│ - Unified sync wave ordering (K8s + Lambda) │
│ - Single orchestrator (ArgoCD) for all resources │
│ │
│ Phase 3 (Long-term): ACK Controller │
│ ───────────────────────────────────────── │
│ - Migrate from Terraform to ACK Lambda Controller │
│ - Lambda becomes native K8s resource │
│ - True GitOps with continuous reconciliation │
│ - Only if AWS becomes more central to infrastructure │
└─────────────────────────────────────────────────────────────────────────────┘
Revised Recommendation: Full GitOps Alignment¶
After analyzing how K8s previews work, the hybrid approach is no longer recommended. Instead, Lambda previews should follow the same pattern as K8s previews:
┌─────────────────────────────────────────────────────────────────────────────┐
│ RECOMMENDED: FULL GITOPS ALIGNMENT FOR ALL PREVIEWS │
│ │
│ K8s Previews (current): │
│ ─────────────────────── │
│ pr-preview.yml → writes files to cluster-gitops → ArgoCD deploys │
│ │
│ Lambda Previews (proposed): │
│ ────────────────────────── │
│ pr-preview-lambda.yml → writes s3-notifier.values.yaml to cluster-gitops │
│ → cluster-gitops workflow triggers │
│ → Terraform deploys │
│ │
│ Benefits: │
│ ✓ Consistent pattern across ALL preview resources │
│ ✓ Single source of truth (cluster-gitops/syrf/environments/preview/pr-N/) │
│ ✓ Complete preview visibility (K8s + Lambda in one place) │
│ ✓ Unified cleanup (delete pr-N/ folder = delete everything) │
│ ✓ Acceptable latency (~10-12 min, faster than K8s ~15 min) │
│ │
│ Trade-offs (acceptable): │
│ ⚠️ Cross-repo coordination (same as K8s previews) │
│ ⚠️ More git commits for preview changes (same as K8s previews) │
│ ⚠️ Distributed state (same complexity as K8s previews) │
└─────────────────────────────────────────────────────────────────────────────┘
Key insight: The considerations documented above apply equally to K8s previews, yet we accept them for K8s. Lambda should be no different.
Historical Context: Why the Analysis Changed¶
The original "Hybrid Approach" was based on comparing Lambda GitOps against an idealized "direct deploy" baseline. After understanding how K8s previews actually work, the comparison baseline changed:
| Metric | Lambda Direct | Lambda GitOps | K8s GitOps (current) |
|---|---|---|---|
| Latency | ~5-8 min | ~10-12 min | ~15-17 min |
| Git commits | 0 | ~2/PR | ~2/PR |
| Complexity | Simple | Moderate | Moderate |
| Consistency | ❌ Inconsistent | ✅ Consistent | ✅ Consistent |
| Visibility | ❌ None | ✅ Full | ✅ Full |
Lambda with GitOps is actually faster than K8s previews and achieves consistency.
Summary¶
| Aspect | Current | Proposed (Option 1) |
|---|---|---|
| Version source of truth | CI/CD workflow | cluster-gitops config.yaml |
| Build trigger | Push to main | Push to main (unchanged) |
| Deploy trigger | Same workflow | cluster-gitops PR merge |
| Artifact storage | GitHub Releases | GitHub Releases (unchanged) |
| Deployment mechanism | GitHub Actions + Terraform | GitHub Actions + Terraform (unchanged) |
| Reconciliation | None (one-shot) | Event-driven (PR merge) |
| Drift correction | None | None (acceptable for Lambda) |
The key change is decoupling build from deploy and making cluster-gitops the trigger point for Lambda deployments, consistent with the Kubernetes service pattern.
Detailed Implementation Plan¶
Step 1: Create Lambda Config Files in cluster-gitops¶
Files to create:
# cluster-gitops/syrf/services/s3-notifier/config.yaml
serviceName: s3-notifier
deploymentType: lambda # Distinguishes from K8s services
chartPath: null
chartRepo: null
# cluster-gitops/syrf/environments/staging/s3-notifier/config.yaml
serviceName: s3-notifier
envName: staging
lambda:
version: "0.1.5"
functionName: "syrfAppUploadS3Notifier-staging"
s3TriggerPrefix: "staging/"
gitVersion:
sha: "..."
shortSha: "..."
# cluster-gitops/syrf/environments/production/s3-notifier/config.yaml
serviceName: s3-notifier
envName: production
lambda:
version: "0.1.4"
functionName: "syrfAppUploadS3Notifier"
s3TriggerPrefix: "Projects/"
gitVersion:
sha: "..."
shortSha: "..."
Step 2: Create Lambda Deploy Workflow in cluster-gitops¶
File: cluster-gitops/.github/workflows/lambda-deploy.yml
Triggers on changes to syrf/environments/*/s3-notifier/config.yaml.
Workflow steps:
1. Detect which environment(s) changed (staging, production)
2. Read Lambda version from config.yaml
3. Download .zip from GitHub Release (s3-notifier-v{version})
4. Upload to S3 (lambda-packages/staging.zip or production.zip)
5. Checkout camarades-infrastructure repo
6. Run Terraform with environment-specific variables
Step 3: Create Staging Lambda in Terraform (PREREQUISITE)¶
⚠️ Critical: Staging Lambda does NOT currently exist. This step must be completed FIRST.
Modify: camarades-infrastructure/terraform/lambda/main.tf
Add new staging Lambda function:
# Staging Lambda function
resource "aws_lambda_function" "staging" {
function_name = "syrfAppUploadS3Notifier-staging"
role = aws_iam_role.lambda_role.arn
runtime = "dotnet8"
handler = "SyRF.S3FileSavedNotifier.Endpoint::SyRF.S3FileSavedNotifier.Endpoint.Function::FunctionHandler"
s3_bucket = var.lambda_bucket
s3_key = "lambda-packages/staging.zip"
source_code_hash = var.staging_source_code_hash
memory_size = 512
timeout = 30
environment {
variables = {
RabbitMqHost = var.staging_rabbitmq_host # Staging-specific
# ... other staging env vars
}
}
}
# S3 trigger permission for staging
resource "aws_lambda_permission" "staging_s3" {
statement_id = "AllowS3InvokeStaging"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.staging.function_name
principal = "s3.amazonaws.com"
source_arn = aws_s3_bucket.uploads.arn
}
Update S3 notifications:
resource "aws_s3_bucket_notification" "bucket_notification" {
bucket = aws_s3_bucket.uploads.id
# Production notification (existing)
lambda_function {
lambda_function_arn = aws_lambda_function.production.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "Projects/"
}
# Staging notification (NEW)
lambda_function {
lambda_function_arn = aws_lambda_function.staging.arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "staging/"
}
# Preview notifications (existing, via dynamic block)
dynamic "lambda_function" {
for_each = var.preview_prs
content {
lambda_function_arn = aws_lambda_function.preview[lambda_function.value].arn
events = ["s3:ObjectCreated:*"]
filter_prefix = "preview/pr-${lambda_function.value}/"
}
}
}
Add staging variables to variables.tf:
variable "staging_version" {
description = "Semantic version for staging Lambda"
type = string
default = ""
}
variable "staging_commit_sha" {
description = "Git commit SHA for staging deployment"
type = string
default = ""
}
variable "staging_source_code_hash" {
description = "Base64-encoded SHA256 hash of staging.zip"
type = string
default = ""
}
variable "staging_rabbitmq_host" {
description = "RabbitMQ host for staging environment"
type = string
}
Production Lambda remains unchanged:
- Function name: syrfAppUploadS3Notifier
- S3 package: lambda-packages/production.zip
- S3 trigger prefix: Projects/
Step 4: Modify ci-cd.yml to Decouple Build from Deploy¶
Changes to syrf/.github/workflows/ci-cd.yml:
Remove from deploy-lambda job:
- Upload to S3 lambda-packages/production.zip
- Terraform init/plan/apply
Keep: - Build Lambda package (.zip) - Create GitHub artifact - Create git tag (separate job) - Create GitHub Release with .zip attachment
Add to promote-to-staging job:
- Update syrf/environments/staging/s3-notifier/config.yaml with new version
- Include in same PR as K8s service promotions
Step 5: S3 Bucket Prefix Structure¶
Current:
New:
syrfapp-uploads/
├── Projects/ ← Production (unchanged)
├── staging/ ← Staging (new)
└── preview/pr-{n}/ ← PR previews (unchanged)
Note: API/Web may need configuration to upload to correct prefix per environment.
Implementation Order (Dependencies)¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ IMPLEMENTATION PHASES │
└─────────────────────────────────────────────────────────────────────────────┘
Phase 1: Infrastructure Preparation (camarades-infrastructure)
─────────────────────────────────────────────────────────────
1. Add staging Lambda function to Terraform
2. Add staging S3 notification trigger
3. Add staging variables
4. Apply Terraform to create staging Lambda
└─► Staging Lambda now exists but is empty (no package yet)
Phase 2: GitOps Structure (cluster-gitops)
───────────────────────────────────────────
5. Create s3-notifier service config
6. Create staging/production config.yaml files
7. Create lambda-deploy.yml workflow
└─► Workflow ready to deploy on config changes
Phase 3: CI/CD Decoupling (syrf)
────────────────────────────────
8. Modify ci-cd.yml to remove direct deploy
9. Add Lambda version to staging promotion PR
└─► Build creates artifact, GitOps triggers deploy
Phase 4: Verification
─────────────────────
10. Test end-to-end: code change → build → promotion → deploy
11. Verify rollback capability
Files to Modify¶
Phase 1: Infrastructure (camarades-infrastructure repo)¶
| File | Changes Required |
|---|---|
terraform/lambda/main.tf |
Add aws_lambda_function.staging resource |
Add staging S3 trigger to aws_s3_bucket_notification |
|
Add aws_lambda_permission.staging_s3 |
|
terraform/lambda/variables.tf |
Add staging_version, staging_commit_sha, staging_source_code_hash, staging_rabbitmq_host |
terraform/lambda/outputs.tf |
Add staging Lambda ARN output |
Phase 2: GitOps Structure (cluster-gitops repo)¶
| File | Changes Required |
|---|---|
syrf/services/s3-notifier/config.yaml |
Create - service metadata with deploymentType: lambda |
syrf/environments/staging/s3-notifier/config.yaml |
Create - staging version declaration |
syrf/environments/production/s3-notifier/config.yaml |
Create - production version declaration |
.github/workflows/lambda-deploy.yml |
Create - GitOps-triggered deployment workflow |
Phase 3: CI/CD Changes (syrf repo)¶
| File | Changes Required |
|---|---|
.github/workflows/ci-cd.yml |
Remove from deploy-lambda: S3 upload, Terraform apply |
Keep in deploy-lambda: Build package, create artifact |
|
Modify promote-to-staging: Include Lambda config update |
|
Modify promote-to-production: Include Lambda config update |
Verification Steps¶
Pre-Implementation Checks¶
- Verify current state:
# List existing Lambda functions aws lambda list-functions --query "Functions[?starts_with(FunctionName, 'syrfApp')].[FunctionName,LastModified]" --output table # Confirm staging Lambda does NOT exist aws lambda get-function --function-name syrfAppUploadS3Notifier-staging 2>&1 | grep -q "ResourceNotFoundException" && echo "Confirmed: Staging Lambda does not exist"
Phase 1: Infrastructure Verification¶
- After Terraform apply:
Phase 3: End-to-End Tests¶
- Test build-only flow:
- Push s3-notifier change to main
- Verify: GitHub Release created with .zip
- Verify: No direct Lambda deployment (Terraform step skipped)
-
Verify: PR created to cluster-gitops with version update in staging config
-
Test GitOps deployment:
- Merge promotion PR to cluster-gitops
- Verify: lambda-deploy.yml triggers
-
Verify: Staging Lambda updated in AWS
-
Test production promotion:
- Create PR updating production config.yaml
- Manual review and merge
-
Verify: Production Lambda updated
-
Test rollback:
- Update staging config.yaml to previous version (e.g.,
0.1.4→0.1.3) - Verify: Lambda reverts to that version
-
Verify: GitHub Release for target version still exists (artifact available)
-
Test file upload triggers:
- Upload test file to
staging/prefix - Verify: Staging Lambda invoked
- Upload test file to
Projects/prefix - Verify: Production Lambda invoked (unchanged behavior)
Implementation Status¶
This document is DOCUMENTATION ONLY for a future PR.
No implementation will be done in the current session. This document serves as: 1. Architectural design and rationale 2. Analysis of deployment mechanism differences (ci-cd.yml vs pr-preview-lambda.yml) 3. Detailed reasoning for why GitOps is appropriate for staging/production but NOT for previews 4. Implementation roadmap for when this work is prioritized
Future PR Checklist¶
When implementing this feature, create PRs in this order:
- PR 1 (camarades-infrastructure): Add staging Lambda to Terraform
- Add
aws_lambda_function.stagingresource - Add staging S3 notification trigger
- Add staging variables
-
Test with
terraform plan -
PR 2 (cluster-gitops): Add GitOps structure for Lambda
- Create
syrf/services/s3-notifier/config.yaml - Create
syrf/environments/staging/s3-notifier/config.yaml - Create
syrf/environments/production/s3-notifier/config.yaml -
Create
.github/workflows/lambda-deploy.yml -
PR 3 (syrf): Decouple build from deploy in ci-cd.yml
- Remove Terraform apply from deploy-lambda job
- Add Lambda version to staging promotion PR
-
Update promote-to-production to include Lambda
-
PR 4 (all repos): End-to-end verification
- Test build-only flow
- Test GitOps deployment to staging
- Test production promotion
- Test rollback capability