Skip to content

Lambda GitOps Versioning - Architecture Design

SUPERSEDED: This document has been superseded by Lambda GitOps Integration. The analysis and options in this document informed the final consolidated decision. This document is preserved for historical reference only.

Problem Statement

Currently, there's an architectural asymmetry in how service versions are managed:

Service Type Source of Truth Deployment Mechanism Reconciliation
Kubernetes services cluster-gitops/syrf/environments/{env}/{svc}/config.yaml ArgoCD Continuous
Lambda (S3 Notifier) CI/CD workflow builds and deploys directly GitHub Actions + Terraform One-shot

Goal: Make cluster-gitops the single source of truth for Lambda versions, with deployment triggered by version changes in that repo.


Current Lambda Deployment Flow

┌──────────────────────────────────────────────────────────────────────────┐
│                        CURRENT STATE (Direct Deploy)                      │
└──────────────────────────────────────────────────────────────────────────┘

1. Push to main branch (src/services/s3-notifier/)
2. ci-cd.yml workflow triggers
        ├─► GitVersion calculates version (e.g., 0.1.5)
        ├─► dotnet publish → production.zip
        ├─► Upload to S3: lambda-packages/production.zip
        ├─► Terraform apply (camarades-infrastructure/terraform/lambda/)
        ├─► Create git tag: s3-notifier-v0.1.5
        └─► Create GitHub Release with .zip attachment

No version declaration in cluster-gitops - version is computed and deployed in one step.

Current Deployment Mechanism Comparison

Production Lambda (ci-cd.yml)

Trigger: Push to main affecting src/services/s3-notifier/

Key Characteristics:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         PRODUCTION DEPLOYMENT FLOW                           │
└─────────────────────────────────────────────────────────────────────────────┘

1. Build & Package
   └─► dotnet lambda package → production.zip

2. Upload to S3
   └─► aws s3 cp production.zip s3://camarades-terraform-state-aws/lambda-packages/production.zip
       (FIXED path - always overwrites)

3. Terraform Apply
   ├─► Variables:
   │   - TF_VAR_production_commit_sha = "abc123..."
   │   - TF_VAR_production_source_code_hash = base64(sha256(production.zip))
   └─► Resources:
       - aws_lambda_function.production (syrfAppUploadS3Notifier)
       - S3 notification on prefix: Projects/

4. Concurrency Control
   └─► concurrency group: production-lambda-terraform (serialized deploys)

Terraform Resources (single function, fixed config): - Function: syrfAppUploadS3Notifier - S3 Source: lambda-packages/production.zip - S3 Trigger Prefix: Projects/ - Environment Variables: Production RabbitMQ host, etc.

Preview Lambdas (pr-preview-lambda.yml)

Trigger: PR labeled with preview, synchronize, or close events

Key Characteristics:

┌─────────────────────────────────────────────────────────────────────────────┐
│                          PREVIEW DEPLOYMENT FLOW                             │
└─────────────────────────────────────────────────────────────────────────────┘

1. Collect Active PRs
   └─► GitHub API: List all PRs with 'preview' label
       Output: ["123", "456", "789"]

2. Build Per-PR Package
   └─► dotnet lambda package → pr-{number}.zip

3. Upload Per-PR Package
   └─► aws s3 cp pr-{number}.zip s3://camarades-terraform-state-aws/lambda-packages/pr-{number}.zip
       (DYNAMIC path per PR)

4. Terraform Apply with for_each
   ├─► Variables:
   │   - TF_VAR_preview_prs = ["123", "456", "789"]
   │   - TF_VAR_preview_commit_shas = {"123": "abc...", "456": "def..."}
   │   - TF_VAR_preview_versions = {"123": "0.1.5-pr123", "456": "0.1.5-pr456"}
   └─► Resources (via for_each):
       - aws_lambda_function.preview["123"] → syrfAppUploadS3Notifier-pr-123
       - aws_lambda_function.preview["456"] → syrfAppUploadS3Notifier-pr-456
       - S3 notifications per prefix: preview/pr-{n}/

5. Cleanup on PR Close
   └─► Remove PR from preview_prs set → Terraform destroys Lambda

Terraform Pattern (for_each for dynamic resources):

# Preview Lambda functions - one per PR
resource "aws_lambda_function" "preview" {
  for_each      = var.preview_prs  # set(string): ["123", "456"]
  function_name = "syrfAppUploadS3Notifier-pr-${each.key}"
  s3_bucket     = var.lambda_bucket
  s3_key        = "lambda-packages/pr-${each.key}.zip"
  # ...
}

# S3 notifications with dynamic block
resource "aws_s3_bucket_notification" "bucket_notification" {
  # Production notification (always)
  lambda_function {
    lambda_function_arn = aws_lambda_function.production.arn
    filter_prefix       = "Projects/"
  }

  # Preview notifications (dynamic per PR)
  dynamic "lambda_function" {
    for_each = var.preview_prs
    content {
      lambda_function_arn = aws_lambda_function.preview[lambda_function.value].arn
      filter_prefix       = "preview/pr-${lambda_function.value}/"
    }
  }
}

Critical Finding: No Staging Lambda Exists

Current State Analysis:

Environment Lambda Function S3 Prefix Status
Production syrfAppUploadS3Notifier Projects/ ✅ Exists
Preview syrfAppUploadS3Notifier-pr-{n} preview/pr-{n}/ ✅ Exists (dynamic)
Staging None None ❌ Does NOT exist

Implication: Staging environment currently shares production Lambda, meaning: - Staging deployments don't test Lambda changes before production - No isolation between staging and production file processing - Adding staging Lambda is a prerequisite for GitOps versioning

Architectural Differences Summary

Aspect Production (ci-cd.yml) Preview (pr-preview-lambda.yml)
Lifecycle Permanent Ephemeral (PR lifecycle)
Function Count 1 N (one per PR)
Terraform Pattern Single resource for_each over set
S3 Package Path production.zip (fixed) pr-{n}.zip (dynamic)
Version Tracking GitHub tag + Release PR commit SHA
Cleanup Never Auto on PR close
Trigger Push to main PR label events

Key Variables in Terraform

# Production
variable "production_commit_sha" {
  description = "Git commit SHA for production deployment"
  type        = string
}

variable "production_source_code_hash" {
  description = "Base64-encoded SHA256 hash of production.zip"
  type        = string
}

# Preview (for_each pattern)
variable "preview_prs" {
  description = "Set of PR numbers with preview label"
  type        = set(string)
  default     = []
}

variable "preview_commit_shas" {
  description = "Map of PR number to commit SHA"
  type        = map(string)
  default     = {}
}

variable "preview_versions" {
  description = "Map of PR number to semantic version"
  type        = map(string)
  default     = {}
}

Desired State

┌──────────────────────────────────────────────────────────────────────────┐
│                     DESIRED STATE (GitOps Versioning)                     │
└──────────────────────────────────────────────────────────────────────────┘

cluster-gitops/syrf/environments/
├── staging/
│   └── s3-notifier/
│       └── config.yaml          ◄─── lambdaVersion: 0.1.5
│                                      (single source of truth)
└── production/
    └── s3-notifier/
        └── config.yaml          ◄─── lambdaVersion: 0.1.4
                                       (manually promoted)

Change to config.yaml → triggers → Lambda deployment

Design Options

Architecture: cluster-gitops PR merge → GitHub Actions → Terraform apply

┌─────────────────────────────────────────────────────────────────────────────┐
│                            OPTION 1: GitOps-Triggered Terraform             │
└─────────────────────────────────────────────────────────────────────────────┘

Phase A: Build & Release (unchanged from current)
─────────────────────────────────────────────────
1. Push to main (s3-notifier code changes)
2. ci-cd.yml workflow:
   ├─► GitVersion → 0.1.6
   ├─► Build Lambda package (.zip)
   ├─► Create git tag: s3-notifier-v0.1.6
   └─► Create GitHub Release with .zip attachment

   ❌ Does NOT deploy to Lambda
   ❌ Does NOT upload to S3 lambda-packages/
   ✓  Only creates versioned artifact

Phase B: Promote to Staging (GitOps)
────────────────────────────────────
3. ci-cd.yml creates PR to cluster-gitops:
   - Updates: syrf/environments/staging/s3-notifier/config.yaml
   - Sets: lambdaVersion: "0.1.6"

4. PR auto-merges (staging)

5. cluster-gitops webhook → lambda-deploy.yml workflow
   ├─► Detect which environment changed (staging)
   ├─► Download .zip from GitHub Release (s3-notifier-v0.1.6)
   ├─► Upload to S3: lambda-packages/staging.zip
   └─► Terraform apply with:
       - TF_VAR_staging_version=0.1.6
       - TF_VAR_staging_source_code_hash=<calculated>

Phase C: Promote to Production (Manual Gate)
────────────────────────────────────────────
6. After staging verification, trigger production promotion
7. Creates PR to cluster-gitops:
   - Updates: syrf/environments/production/s3-notifier/config.yaml
   - Sets: lambdaVersion: "0.1.6"

8. Manual review & merge

9. Same workflow as step 5, but for production environment

File Structure in cluster-gitops:

# syrf/environments/staging/s3-notifier/config.yaml
serviceName: s3-notifier
envName: staging
lambda:
  version: "0.1.6"
  functionName: "syrfAppUploadS3Notifier"  # or staging-specific name
  s3Prefix: "Projects/"
gitVersion:
  sha: "abc123"
  shortSha: "abc123"

Terraform Changes Required: - Refactor to support staging/production as separate Lambda functions (or same function with version alias) - Accept version and S3 package path as variables - Support Lambda aliases for blue/green deployment

Pros: - ✅ Leverages existing Terraform Lambda management - ✅ Uses pre-built packages from GitHub Releases - ✅ Consistent promotion PR pattern with K8s services - ✅ No new operators or controllers - ✅ Clear audit trail (version change → PR → deployment)

Cons: - ⚠️ Not continuous reconciliation (event-driven, not polling) - ⚠️ Cross-repo workflow triggers add complexity - ⚠️ Lambda drift won't be auto-corrected (manual intervention needed)


Option 2: AWS Controllers for Kubernetes (ACK)

Architecture: Lambda managed as Kubernetes CRD, reconciled by ACK operator

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION 2: AWS Controllers for Kubernetes                  │
└─────────────────────────────────────────────────────────────────────────────┘

1. Install ACK Lambda Controller in GKE cluster
   - Requires IAM role for service account (IRSA) or GKE Workload Identity → AWS
   - Controller watches for Lambda CRDs

2. Define Lambda as Kubernetes manifest in cluster-gitops:

   # syrf/environments/staging/s3-notifier/lambda.yaml
   apiVersion: lambda.services.k8s.aws/v1alpha1
   kind: Function
   metadata:
     name: syrf-s3-notifier-staging
     namespace: syrf-staging
   spec:
     name: syrfAppUploadS3Notifier-staging
     runtime: dotnet8
     handler: "SyRF.S3FileSavedNotifier.Endpoint::..."
     code:
       s3Bucket: camarades-terraform-state-aws
       s3Key: lambda-packages/staging-0.1.6.zip  # ← Version in filename
     environment:
       variables:
         RabbitMqHost: "amqp://..."
     memorySize: 512
     timeout: 30

3. ArgoCD syncs Lambda CRD to cluster

4. ACK Lambda Controller reconciles:
   - Creates/updates AWS Lambda function to match spec
   - Continuous reconciliation (drift correction)
   - Status reflected back to K8s resource

5. S3 bucket notifications still managed separately (Terraform or ACK S3 controller)

Pros: - ✅ True GitOps with continuous reconciliation - ✅ Automatic drift correction - ✅ Consistent with K8s patterns (everything is a CRD) - ✅ ArgoCD manages Lambda just like other resources

Cons: - ⚠️ Requires ACK Lambda controller installation and maintenance - ⚠️ Complex cross-cloud IAM (GKE → AWS) - ⚠️ Different from current Terraform approach (migration effort) - ⚠️ S3 bucket notifications need separate management - ⚠️ ACK is AWS-specific; adds AWS dependency to K8s cluster


Option 3: Flux Terraform Controller

Architecture: Terraform execution managed as Kubernetes resource

┌─────────────────────────────────────────────────────────────────────────────┐
│                      OPTION 3: Flux Terraform Controller                     │
└─────────────────────────────────────────────────────────────────────────────┘

1. Install Flux Terraform Controller (tf-controller)
   - Runs Terraform inside K8s pods
   - Manages Terraform state

2. Define Terraform resource in cluster-gitops:

   # syrf/environments/staging/s3-notifier/terraform.yaml
   apiVersion: infra.contrib.fluxcd.io/v1alpha2
   kind: Terraform
   metadata:
     name: s3-notifier-staging
     namespace: flux-system
   spec:
     approvePlan: auto
     interval: 10m
     path: ./terraform/lambda
     sourceRef:
       kind: GitRepository
       name: camarades-infrastructure
     vars:
       - name: environment
         value: staging
       - name: lambda_version
         value: "0.1.6"  # ← Version declared here
     varsFrom:
       - kind: Secret
         name: lambda-terraform-vars

3. ArgoCD (or Flux) syncs Terraform CRD

4. Terraform Controller:
   - Clones camarades-infrastructure
   - Runs terraform plan/apply
   - Reconciles on interval (10m)

Pros: - ✅ Kubernetes-native Terraform execution - ✅ Drift detection via interval reconciliation - ✅ Keeps Terraform as deployment mechanism - ✅ Secrets management via K8s secrets

Cons: - ⚠️ Another controller to install and maintain - ⚠️ Terraform runs in cluster (security implications) - ⚠️ Mixing GitOps tools (ArgoCD + Flux component) - ⚠️ More complex than GitHub Actions approach


Option 4: Lambda Versioning with Aliases

Architecture: Single Lambda function with version aliases, managed via GitOps

┌─────────────────────────────────────────────────────────────────────────────┐
│                     OPTION 4: Lambda Aliases (Traffic Shifting)              │
└─────────────────────────────────────────────────────────────────────────────┘

Instead of separate staging/production functions, use Lambda aliases:

Lambda Function: syrfAppUploadS3Notifier
├── $LATEST (always latest deployed code)
├── Version 42 (published version for v0.1.5)
├── Version 43 (published version for v0.1.6)
├── Alias: staging  → points to Version 43
└── Alias: production → points to Version 42

cluster-gitops declares which version each alias points to:

# syrf/environments/staging/s3-notifier/config.yaml
lambda:
  alias: staging
  version: 43  # Lambda published version number

S3 bucket notifications route by prefix:
- Projects/staging/* → staging alias
- Projects/* → production alias

Pros: - ✅ Single function with multiple "environments" - ✅ Instant rollback (just change alias) - ✅ Traffic shifting possible (gradual rollout) - ✅ AWS-native blue/green deployment

Cons: - ⚠️ S3 notifications routing becomes complex - ⚠️ Current architecture assumes separate prefixes per PR, not aliases - ⚠️ Requires rethinking S3 trigger architecture


Recommendation: Option 1 (GitOps-Triggered Terraform)

Rationale:

  1. Minimal disruption: Uses existing Terraform, GitHub Releases, S3 backend
  2. Familiar patterns: Matches current K8s promotion workflow
  3. No new operators: Avoids ACK/Flux complexity
  4. Sufficient for use case: Lambda versions change infrequently (releases, not continuous)
  5. Clear separation of concerns:
  6. cluster-gitops: Declares desired version
  7. GitHub Actions: Orchestrates deployment
  8. Terraform: Manages Lambda infrastructure
  9. GitHub Releases: Stores versioned artifacts

Key Insight: Lambda doesn't need continuous reconciliation like Kubernetes pods. Version changes are discrete events (releases), not continuous drift. Event-driven deployment (PR merge → workflow) is appropriate.


Implementation Plan for Option 1

Phase 1: Refactor CI/CD to Separate Build from Deploy

Current: ci-cd.yml builds AND deploys Lambda in one workflow Target: Build creates artifact only; deploy triggered by cluster-gitops change

Changes to ci-cd.yml: 1. Keep: GitVersion, build, create tag, create GitHub Release 2. Remove: Upload to S3, Terraform apply 3. Add: Update cluster-gitops staging config with new version

Phase 2: Create Lambda Deploy Workflow in cluster-gitops

New workflow: cluster-gitops/.github/workflows/lambda-deploy.yml

Triggers on: - Push to main affecting syrf/environments/*/s3-notifier/config.yaml

Steps: 1. Determine which environment(s) changed 2. Read version from config.yaml 3. Download .zip from GitHub Release (s3-notifier-v{version}) 4. Upload to S3 (environment-specific path) 5. Checkout camarades-infrastructure 6. Run Terraform with environment and version variables

Phase 3: Refactor Terraform for Multi-Environment

Current: Single production Lambda, preview Lambdas dynamic Target: Staging + Production as separate functions (or aliases)

Options: - A) Separate functions: syrfAppUploadS3Notifier-staging, syrfAppUploadS3Notifier-production - B) Single function with aliases: staging and production aliases pointing to versions

Recommend Option A initially for simplicity, can migrate to aliases later.

Phase 4: Add config.yaml for Lambda in cluster-gitops

Create files:

syrf/environments/staging/s3-notifier/config.yaml
syrf/environments/production/s3-notifier/config.yaml

Schema:

serviceName: s3-notifier
envName: staging
lambda:
  version: "0.1.6"
  functionName: "syrfAppUploadS3Notifier-staging"
  s3TriggerPrefix: "Projects/"
gitVersion:
  sha: "abc123def456..."
  shortSha: "abc123"
deploymentNotification:
  commitSha: "abc123def456..."


Design Decisions (User Input)

Question Decision Rationale
Environment Isolation Separate functions syrfAppUploadS3Notifier-staging and syrfAppUploadS3Notifier (production)
AWS Accounts Same account Simpler credential management
Preview Lambdas Full GitOps alignment Match K8s preview pattern for consistency

Preview Lambda Strategy (Revised)

After analyzing K8s preview patterns, preview Lambdas should follow the same GitOps pattern as K8s services:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PREVIEW LAMBDA GITOPS PATTERN                            │
│                                                                             │
│  1. pr-preview-lambda.yml builds Lambda package                             │
│                                                                             │
│  2. pr-preview-lambda.yml writes config to cluster-gitops:                  │
│     syrf/environments/preview/pr-{n}/services/s3-notifier.values.yaml       │
│                                                                             │
│  3. cluster-gitops lambda-deploy.yml triggers on file change                │
│                                                                             │
│  4. Terraform applies with preview_prs set derived from files               │
│                                                                             │
│  5. Cleanup: delete pr-{n}/ folder → triggers Lambda destruction            │
└─────────────────────────────────────────────────────────────────────────────┘

Why this changed: K8s previews use the exact same pattern (workflow creates files → ArgoCD deploys). Lambda should be no different for architectural consistency. See "Kubernetes Preview Environment Pattern" and "Revised Recommendation" sections for detailed analysis.


Kubernetes Preview Environment Pattern (For Comparison)

Understanding how K8s previews work is essential for deciding Lambda preview strategy.

K8s Preview Deployment Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                     K8S PREVIEW DEPLOYMENT FLOW                              │
└─────────────────────────────────────────────────────────────────────────────┘

1. PR labeled with "preview"
2. pr-preview.yml workflow triggers
         ├─► Build Docker images with pr-{number} tag
         └─► write-versions job creates files in cluster-gitops:
             └─► syrf/environments/preview/pr-{N}/
                 ├── pr.yaml                    ◄─── ApplicationSet trigger
                 ├── namespace.yaml             ◄─── K8s Namespace definition
                 ├── mongodb-user.yaml          ◄─── Database isolation
                 ├── db-reset-job.yaml          ◄─── PreSync data cleanup
                 └── services/
                     ├── api.values.yaml        ◄─── Service image tags
                     ├── project-management.values.yaml
                     ├── quartz.values.yaml
                     └── web.values.yaml
3. ArgoCD ApplicationSet detects pr.yaml
         ├─► Uses matrix generator: PR × services
         └─► Generates Applications:
             ├── pr-{N}-namespace
             ├── pr-{N}-api
             ├── pr-{N}-project-management
             ├── pr-{N}-quartz
             └── pr-{N}-web
4. ArgoCD syncs Applications (auto-sync enabled)
         └─► Deploys via Helm charts at PR commit SHA
5. Cleanup on PR close
         └─► workflow deletes pr-{N}/ folder → ArgoCD deletes Apps → resources cleaned up

Key K8s Preview Characteristics

Aspect K8s Implementation
Trigger pr-preview.yml creates files in cluster-gitops
Orchestration ArgoCD ApplicationSet (declarative)
Config Location cluster-gitops/syrf/environments/preview/pr-{N}/
Deployment ArgoCD syncs Helm charts from monorepo
Namespace Isolation Per-PR K8s namespace
Database Isolation Per-PR MongoDB database + AtlasDatabaseUser
Version Tracking GitVersion in {service}.values.yaml
Cleanup Delete pr-{N}/ folder → cascading deletion

ApplicationSet Configuration

# argocd/applicationsets/syrf-previews.yaml (simplified)
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
spec:
  generators:
    - matrix:
        generators:
          # Trigger: pr-*/pr.yaml existence
          - git:
              files:
                - path: "syrf/environments/preview/pr-*/pr.yaml"
          # Services to deploy per PR
          - merge:
              generators:
                - git:
                    files:
                      - path: "syrf/services/*/config.yaml"
                - git:
                    files:
                      - path: "syrf/environments/preview/services/*/config.yaml"

Values File Hierarchy (Priority Order)

1. syrf/global.values.yaml                                     # Universal defaults
2. syrf/services/{svc}/values.yaml                             # Service base
3. syrf/environments/preview/preview.values.yaml               # Preview defaults (ALL)
4. syrf/environments/preview/services/{svc}/values.yaml        # Preview per-service
5. syrf/environments/preview/pr-{N}/services/{svc}.values.yaml # PR-SPECIFIC (HIGHEST)

Critical Insight: K8s Previews ARE GitOps

The K8s preview pattern is fully GitOps-driven: 1. Workflow creates config files in cluster-gitops 2. ArgoCD detects file changes via Git generators 3. ApplicationSet generates Applications automatically 4. ArgoCD handles deployment, sync, and cleanup

The workflow triggers GitOps, it doesn't deploy directly.


Lambda vs K8s Preview Pattern Comparison

Aspect K8s Previews Lambda Previews (Current)
Config in cluster-gitops ✅ Yes - pr.yaml, values files ❌ No - managed in workflow
Deployment orchestrator ArgoCD ApplicationSet GitHub Actions + Terraform
Trigger mechanism File existence → ArgoCD Workflow → Terraform directly
State visibility Full visibility in cluster-gitops No visibility (ephemeral)
Cleanup mechanism Delete folder → ArgoCD cascades Workflow removes from TF set
Pattern consistency GitOps-first Imperative-first

The Inconsistency Problem

Currently, Lambda previews operate differently from K8s previews:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        CURRENT INCONSISTENCY                                 │
└─────────────────────────────────────────────────────────────────────────────┘

K8s Preview:
  pr-preview.yml → creates files in cluster-gitops → ArgoCD deploys
                   ↑ GitOps is the deployment mechanism

Lambda Preview:
  pr-preview-lambda.yml → Terraform apply directly → Lambda deployed
                          ↑ Workflow is the deployment mechanism
                          (cluster-gitops not involved)

Revised Analysis: Should Lambda Previews Follow K8s Pattern?

Option A: Full GitOps Alignment (K8s Pattern)

Make Lambda previews follow the same pattern as K8s:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION A: FULL GITOPS ALIGNMENT                          │
└─────────────────────────────────────────────────────────────────────────────┘

1. pr-preview-lambda.yml creates:
   cluster-gitops/syrf/environments/preview/pr-{N}/services/s3-notifier.values.yaml

2. cluster-gitops workflow (lambda-deploy.yml) triggers on file change

3. Workflow runs Terraform with values from config file

4. Cleanup: delete s3-notifier.values.yaml → triggers Lambda destruction

Pros: - ✅ Consistent pattern with K8s previews - ✅ Full visibility in cluster-gitops - ✅ Lambda preview state matches K8s preview state - ✅ Single source of truth for ALL preview resources

Cons: - ⚠️ Adds ~3-5 min latency (cross-repo workflow) - ⚠️ More complex coordination (two workflows) - ⚠️ Still imperative deployment (Terraform), just triggered differently

Option B: Hybrid Pattern (Current + Visibility)

Keep imperative deployment but add cluster-gitops visibility:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION B: HYBRID (VISIBILITY ONLY)                       │
└─────────────────────────────────────────────────────────────────────────────┘

1. pr-preview-lambda.yml deploys Lambda directly (current behavior)

2. AFTER successful deploy, update cluster-gitops:
   cluster-gitops/syrf/environments/preview/pr-{N}/services/s3-notifier.values.yaml

3. cluster-gitops is READ-ONLY for Lambda previews (not a trigger)

4. Cleanup: workflow deletes Lambda AND updates cluster-gitops

Pros: - ✅ Fast deployment (no cross-repo wait) - ✅ Visibility in cluster-gitops - ✅ Simple failure recovery (single workflow)

Cons: - ⚠️ Inconsistent pattern (K8s: GitOps-triggers-deploy, Lambda: deploy-then-update) - ⚠️ cluster-gitops may drift if workflow fails after deploy - ⚠️ Two sources of truth (workflow state + cluster-gitops)

Option C: No Change (Status Quo)

Keep Lambda previews completely separate:

Pros: - ✅ Simplest approach - ✅ Already working

Cons: - ❌ No visibility into Lambda preview state - ❌ Inconsistent with K8s pattern - ❌ Can't see complete preview environment in cluster-gitops


Recommendation: Option A (Full GitOps Alignment)

After deeper analysis, Option A is recommended for Lambda previews because:

1. Consistency Principle

K8s previews already accept the "workflow creates files → GitOps deploys" pattern. Lambda should follow the same pattern for architectural consistency.

2. The Latency Concern is Overstated

K8s Preview Latency Analysis:

pr-preview.yml workflow:
├── detect-changes: ~1 min
├── version-*: ~2 min
├── build-and-push: ~5-10 min
├── write-versions: ~1 min (creates cluster-gitops files)
└── ArgoCD sync: ~2-3 min

Total: ~12-17 minutes

Lambda with GitOps Latency Analysis:

pr-preview-lambda.yml workflow:
├── detect-changes: ~1 min
├── version: ~1 min
├── build Lambda package: ~2 min
├── write config to cluster-gitops: ~1 min
└── cluster-gitops workflow triggers: ~2 min
    └── Terraform apply: ~3-5 min

Total: ~10-12 minutes

Conclusion: Lambda with GitOps is actually FASTER than K8s previews because Lambda builds are faster than Docker builds. The latency concern from earlier analysis was comparing against an idealized "immediate deploy" that K8s also doesn't achieve.

3. Single Source of Truth

With Option A, cluster-gitops/syrf/environments/preview/pr-{N}/ contains EVERYTHING about a preview: - K8s namespace configuration - MongoDB user configuration - ALL service image tags - Lambda version and configuration

This enables: - Complete preview state at a glance - Easier debugging (one place to look) - Consistent cleanup (delete folder = delete everything)

4. Terraform Pattern Adaptation

The for_each concern can be addressed:

# Current: Workflow passes set
variable "preview_prs" {
  type    = set(string)
  default = []
}

# GitOps: Workflow discovers set from cluster-gitops files
# In lambda-deploy.yml workflow:
PREVIEW_PRS=$(find syrf/environments/preview/pr-*/services/s3-notifier.values.yaml \
  -exec dirname {} \; | xargs -I{} basename {} | sed 's/pr-//')

# Pass to Terraform:
TF_VAR_preview_prs='["123", "456"]'

Considerations: GitOps for Preview Lambdas

This section documents the trade-offs of applying GitOps to preview Lambdas. After analyzing the K8s preview pattern, Option A (Full GitOps Alignment) is now recommended, but these considerations should inform implementation decisions.

1. Feedback Loop Latency

Current Preview Flow (~5-8 minutes):

Push to PR → Build Lambda → Upload to S3 → Terraform apply → Lambda ready
            └────────────── Single workflow, linear execution ──────────────┘

Hypothetical GitOps Flow (~15-25 minutes):

Push to PR → Build Lambda → Create PR to cluster-gitops → Wait for merge
                            cluster-gitops PR merged
                            lambda-deploy.yml triggers
                            Download artifact → Terraform apply → Lambda ready

Impact: Developers expect rapid feedback on PR changes. The additional cross-repo coordination adds 10-15 minutes to each iteration cycle, significantly degrading developer experience.

2. Git History Pollution

Problem: Preview environments are high-churn by nature.

cluster-gitops commit history (hypothetical GitOps previews):
─────────────────────────────────────────────────────────────
abc1234 - Update pr-456 s3-notifier to sha def789
bcd2345 - Update pr-123 s3-notifier to sha abc012
cde3456 - Update pr-456 s3-notifier to sha ghi345
def4567 - Remove pr-789 s3-notifier (PR closed)
efg5678 - Update pr-123 s3-notifier to sha jkl678
fgh6789 - Add pr-901 s3-notifier config
...
(dozens of commits per day for active development)

Impact: - Meaningful staging/production changes buried in preview noise - Git blame becomes useless for understanding intentional changes - Repository size grows rapidly with ephemeral config churn

3. Cross-Repository Coordination Complexity

Current Architecture (self-contained):

┌─────────────────────────────────────────────────────────────┐
│                    pr-preview-lambda.yml                     │
│                                                              │
│  1. Get list of preview PRs                                 │
│  2. Build Lambda for current PR                             │
│  3. Upload to S3                                            │
│  4. Terraform apply with preview_prs set                    │
│  5. On PR close: remove from set, Terraform destroys        │
│                                                              │
│  └─► All state managed within single workflow               │
└─────────────────────────────────────────────────────────────┘

GitOps Architecture (distributed state):

┌─────────────────────────────────────────────────────────────┐
│  syrf repo                    cluster-gitops repo            │
│  ──────────                   ───────────────────            │
│  pr-preview.yml ──creates PR──► preview/pr-{n}/config.yaml  │
│        │                              │                      │
│        │                              ▼                      │
│        │                      lambda-deploy.yml              │
│        │                              │                      │
│        │                              ▼                      │
│        │                      Terraform apply                │
│        │                                                     │
│  On PR close:                                               │
│  cleanup.yml ──creates PR──► Remove preview/pr-{n}/         │
│        │                              │                      │
│        │                              ▼                      │
│        │                      lambda-deploy.yml              │
│        │                              │                      │
│        │                              ▼                      │
│        └──────────────────── Terraform destroy               │
│                                                              │
│  Failure modes:                                             │
│  - cluster-gitops PR fails to merge → orphaned state        │
│  - cleanup PR fails → orphaned Lambda resources             │
│  - Race conditions between multiple PRs                     │
│  - Partial failures (K8s deployed, Lambda failed)           │
└─────────────────────────────────────────────────────────────┘

Impact: Distributed state across repositories creates multiple failure modes that don't exist with the current self-contained approach.

4. Terraform for_each Pattern Incompatibility

Current Pattern (works well):

# Workflow passes complete set of active PRs
variable "preview_prs" {
  type    = set(string)
  default = []  # e.g., ["123", "456", "789"]
}

resource "aws_lambda_function" "preview" {
  for_each = var.preview_prs
  # Terraform manages full lifecycle based on set membership
}

GitOps Challenge:

# How would this work?

Option A: Aggregate configs at deploy time
─────────────────────────────────────────
cluster-gitops/
└── syrf/environments/preview/
    ├── pr-123/s3-notifier/config.yaml
    ├── pr-456/s3-notifier/config.yaml
    └── pr-789/s3-notifier/config.yaml

lambda-deploy.yml would need to:
1. List all pr-* directories
2. Build preview_prs set from directory names
3. Pass to Terraform

Problem: What if a PR closes while workflow is running?
         Directory deleted mid-execution → race condition

Option B: Separate Terraform per preview
──────────────────────────────────────────
Each preview has isolated Terraform state

Problems:
- Expensive (separate state file per PR)
- S3 bucket notifications can't be split (single resource)
- Lose the elegance of for_each pattern

Impact: The current for_each pattern is designed for workflow-managed state, not file-based discovery. Retrofitting GitOps would require significant Terraform refactoring.

5. Ephemeral Resources Don't Need GitOps Benefits

GitOps Benefit Value for Staging/Prod Value for Previews
Single source of truth ✅ Critical - must know what's deployed ❌ PR commit IS the source of truth
Audit trail ✅ Critical - compliance, debugging ❌ Previews are throwaway
Manual gates ✅ Critical - prod approval ❌ No approval needed for preview
Rollback via git revert ✅ Useful - revert prod issues ❌ Just push new commit to PR
Drift detection ✅ Useful - ensure consistency ❌ Preview will be deleted anyway

Conclusion: GitOps overhead provides no meaningful benefit for ephemeral preview resources.

6. Circular Dependency Problem

┌─────────────────────────────────────────────────────────────┐
│                    CHICKEN-AND-EGG PROBLEM                   │
└─────────────────────────────────────────────────────────────┘

Q: When should cluster-gitops preview config be created?

Option A: Before PR exists
─────────────────────────
Can't create config without PR number
PR number assigned by GitHub when PR is created
→ Impossible

Option B: After PR created, before first build
─────────────────────────────────────────────
PR created → workflow creates cluster-gitops config → triggers deploy
                                        └─► But deploy needs the artifact
                                            Artifact created by same workflow
                                            → Circular dependency

Option C: After artifact built
──────────────────────────────
PR created → build artifact → create cluster-gitops PR → deploy
                                        └─► Adds latency (Option B problem)
                                            Every push updates cluster-gitops
                                            → Git pollution problem

Impact: There's no clean way to bootstrap preview configs without introducing either latency or complexity.

7. Failure Recovery Complexity

Current Approach (idempotent):

# If preview deployment fails, just re-run the workflow
# Workflow has complete state, can retry everything
gh workflow run pr-preview-lambda.yml

GitOps Approach (distributed state):

# If deployment fails, need to diagnose where:
# 1. Did syrf workflow fail to create cluster-gitops PR?
# 2. Did cluster-gitops PR fail to merge?
# 3. Did lambda-deploy.yml fail to trigger?
# 4. Did Terraform fail?

# Recovery requires:
# - Check cluster-gitops for pending PRs
# - Check if config exists but Lambda doesn't
# - Manually reconcile state between repos

Impact: Debugging and recovery become significantly more complex with distributed state.


Advanced Option: Unified Preview Workflow (Lambda + K8s)

Concept: Lambda as "Just Another Service" in pr-preview.yml

Instead of having separate workflows (pr-preview.yml for K8s, pr-preview-lambda.yml for Lambda), Lambda could be integrated into the same pr-preview.yml workflow as K8s services.

┌─────────────────────────────────────────────────────────────────────────────┐
│                  UNIFIED PREVIEW WORKFLOW (pr-preview.yml)                  │
│                                                                             │
│  Current (Separate):                                                        │
│  ───────────────────                                                        │
│  pr-preview.yml ──────► K8s services (Docker build → cluster-gitops → ArgoCD)
│  pr-preview-lambda.yml ► Lambda (dotnet build → Terraform directly)        │
│                                                                             │
│  Proposed (Unified):                                                        │
│  ──────────────────────                                                     │
│  pr-preview.yml ──────► ALL services:                                       │
│                         ├── K8s services (existing)                         │
│                         │   └── writes: pr-{n}/services/{svc}.values.yaml   │
│                         │                                                   │
│                         └── Lambda (new)                                    │
│                             ├── Build Lambda package (.zip)                 │
│                             ├── Upload to S3: lambda-packages/pr-{n}.zip    │
│                             └── writes: pr-{n}/services/s3-notifier.values.yaml
│                                                                             │
│  ArgoCD ApplicationSet generates Application for Lambda (like K8s services)│
│  Lambda Application syncs Terraform Job or ACK resource                     │
└─────────────────────────────────────────────────────────────────────────────┘

Benefits:

  1. Single Workflow: One pr-preview.yml handles ALL preview resources
  2. Consistent Pattern: Lambda config written to cluster-gitops same as K8s
  3. Unified Timing: Lambda and K8s services deploy in coordinated sync waves
  4. Single Cleanup: Delete pr-{n}/ folder cleans up EVERYTHING

Implementation in pr-preview.yml:

jobs:
  # Existing K8s jobs...
  build-and-push-images:
    # ... existing Docker builds

  # NEW: Add Lambda build alongside K8s builds
  build-lambda:
    if: needs.detect-changes.outputs.s3_notifier_changed == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup .NET
        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '8.0.x'

      - name: Install Lambda Tools
        run: dotnet tool install -g Amazon.Lambda.Tools

      - name: Build Lambda Package
        run: |
          cd src/services/s3-notifier/SyRF.S3FileSavedNotifier.Endpoint
          dotnet lambda package -o pr-${{ github.event.pull_request.number }}.zip

      - name: Upload to S3
        run: |
          aws s3 cp pr-${{ github.event.pull_request.number }}.zip \
            s3://camarades-terraform-state-aws/lambda-packages/pr-${{ github.event.pull_request.number }}.zip

  # MODIFIED: write-versions now includes Lambda
  write-versions:
    runs-on: ubuntu-latest
    steps:
      # ... existing K8s service values ...

      - name: Write Lambda values
        if: needs.build-lambda.result == 'success' || needs.retag-lambda.result == 'success'
        run: |
          cat > syrf/environments/preview/pr-${{ env.PR_NUMBER }}/services/s3-notifier.values.yaml <<EOF
          serviceName: s3-notifier
          deploymentType: lambda
          lambda:
            version: "${{ needs.version-s3-notifier.outputs.version }}"
            commitSha: "${{ needs.version-s3-notifier.outputs.sha }}"
            s3Key: "lambda-packages/pr-${{ env.PR_NUMBER }}.zip"
          gitVersion:
            sha: "${{ needs.version-s3-notifier.outputs.sha }}"
            shortSha: "${{ needs.version-s3-notifier.outputs.shortSha }}"
          EOF

Advanced Option: ArgoCD-Orchestrated Lambda Deployment

Concept: ArgoCD as Unified Orchestrator for K8s AND Lambda

Instead of GitHub Actions triggering Terraform directly, ArgoCD could coordinate Lambda deployment using one of these patterns:

Option A: ACK (AWS Controllers for Kubernetes)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION A: ACK LAMBDA CONTROLLER                          │
│                                                                             │
│  cluster-gitops/syrf/environments/preview/pr-{n}/lambda/function.yaml       │
│  ────────────────────────────────────────────────────────────────────────── │
│  apiVersion: lambda.services.k8s.aws/v1alpha1                               │
│  kind: Function                                                             │
│  metadata:                                                                  │
│    name: s3-notifier-pr-123                                                 │
│    namespace: pr-123                                                        │
│  spec:                                                                      │
│    name: syrfAppUploadS3Notifier-pr-123                                     │
│    runtime: dotnet8                                                         │
│    handler: "SyRF.S3FileSavedNotifier..."                                   │
│    code:                                                                    │
│      s3Bucket: camarades-terraform-state-aws                                │
│      s3Key: lambda-packages/pr-123.zip                                      │
│                                                                             │
│  Flow:                                                                      │
│  pr-preview.yml → writes function.yaml → ArgoCD syncs → ACK creates Lambda  │
│                                                                             │
│  Pros:                                                                      │
│  ✓ True GitOps (ArgoCD is orchestrator)                                     │
│  ✓ Continuous reconciliation (drift correction)                             │
│  ✓ Lambda is just another K8s resource                                      │
│                                                                             │
│  Cons:                                                                      │
│  ⚠️ Requires ACK Lambda Controller installation                              │
│  ⚠️ Cross-cloud IAM (GKE Workload Identity → AWS IAM)                        │
│  ⚠️ S3 bucket notifications need separate controller                        │
│  ⚠️ Migration from Terraform                                                 │
└─────────────────────────────────────────────────────────────────────────────┘

Option B: ArgoCD Sync Hook + Terraform Job

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION B: ARGOCD SYNC HOOK + TERRAFORM JOB               │
│                                                                             │
│  cluster-gitops/syrf/environments/preview/pr-{n}/lambda/deploy-job.yaml     │
│  ────────────────────────────────────────────────────────────────────────── │
│  apiVersion: batch/v1                                                       │
│  kind: Job                                                                  │
│  metadata:                                                                  │
│    name: deploy-lambda-pr-123                                               │
│    annotations:                                                             │
│      argocd.argoproj.io/hook: Sync                                          │
│      argocd.argoproj.io/sync-wave: "5"  # After K8s services                │
│  spec:                                                                      │
│    template:                                                                │
│      spec:                                                                  │
│        serviceAccountName: terraform-runner                                 │
│        containers:                                                          │
│          - name: terraform                                                  │
│            image: hashicorp/terraform:1.6                                   │
│            env:                                                             │
│              - name: PR_NUMBER                                              │
│                value: "123"                                                 │
│              - name: AWS_REGION                                             │
│                value: eu-west-1                                             │
│            command: ["/bin/sh", "-c"]                                       │
│            args:                                                            │
│              - |                                                            │
│                git clone https://github.com/camaradesuk/camarades-infrastructure
│                cd camarades-infrastructure/terraform/lambda                 │
│                terraform init                                               │
│                terraform apply -var="preview_prs=[\"$PR_NUMBER\"]" -auto-approve
│                                                                             │
│  Flow:                                                                      │
│  pr-preview.yml → writes deploy-job.yaml → ArgoCD syncs → Job runs Terraform│
│                                                                             │
│  Sync Wave Ordering:                                                        │
│  Wave -10: ExternalSecret (Atlas API credentials)                           │
│  Wave -5:  AtlasDatabaseUser (creates MongoDB user)                         │
│  Wave -1:  db-reset Job (drops collections)                                 │
│  Wave 0:   K8s Deployments (API, PM, Web, etc.)                             │
│  Wave 5:   Lambda deploy Job (Terraform) ← NEW                              │
│                                                                             │
│  Pros:                                                                      │
│  ✓ ArgoCD orchestrates ALL resources                                        │
│  ✓ Uses existing Terraform                                                  │
│  ✓ Sync waves ensure correct ordering                                       │
│  ✓ No additional controllers                                                │
│                                                                             │
│  Cons:                                                                      │
│  ⚠️ Terraform runs as K8s Job (state management complexity)                  │
│  ⚠️ AWS credentials in K8s secrets                                           │
│  ⚠️ Job cleanup and retry handling                                           │
└─────────────────────────────────────────────────────────────────────────────┘

Option C: Flamingo (ArgoCD + Flux Terraform Controller)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION C: FLAMINGO / FLUX TF CONTROLLER                  │
│                                                                             │
│  cluster-gitops/syrf/environments/preview/pr-{n}/lambda/terraform.yaml      │
│  ────────────────────────────────────────────────────────────────────────── │
│  apiVersion: infra.contrib.fluxcd.io/v1alpha2                               │
│  kind: Terraform                                                            │
│  metadata:                                                                  │
│    name: s3-notifier-pr-123                                                 │
│  spec:                                                                      │
│    approvePlan: auto                                                        │
│    interval: 5m                                                             │
│    path: ./terraform/lambda                                                 │
│    sourceRef:                                                               │
│      kind: GitRepository                                                    │
│      name: camarades-infrastructure                                         │
│    vars:                                                                    │
│      - name: preview_prs                                                    │
│        value: '["123"]'                                                     │
│                                                                             │
│  Flow:                                                                      │
│  pr-preview.yml → writes terraform.yaml → ArgoCD syncs → Flux TF applies    │
│                                                                             │
│  Pros:                                                                      │
│  ✓ ArgoCD orchestrates                                                      │
│  ✓ Proper Terraform state management                                        │
│  ✓ Continuous reconciliation                                                │
│                                                                             │
│  Cons:                                                                      │
│  ⚠️ Requires Flux Terraform Controller installation                          │
│  ⚠️ Mixing GitOps tools (ArgoCD + Flux)                                      │
│  ⚠️ More infrastructure complexity                                           │
└─────────────────────────────────────────────────────────────────────────────┘

Comparison: Lambda Deployment Approaches

Approach Orchestrator Lambda Deploy State Management Complexity Recommendation
Current GitHub Actions Terraform (workflow) Terraform state Low Status quo
GitOps Workflow Trigger GitHub Actions Terraform (workflow) Terraform state Medium ✅ Practical
ACK Controller ArgoCD ACK (K8s CRD) ACK Controller High Future option
Sync Hook + TF Job ArgoCD Terraform (K8s Job) Terraform state Medium-High ✅ Best unification
Flamingo/Flux TF ArgoCD Flux TF Controller Flux TF High Over-engineered
┌─────────────────────────────────────────────────────────────────────────────┐
│                         RECOMMENDED EVOLUTION PATH                          │
│                                                                             │
│  Phase 1 (Immediate): GitOps Workflow Trigger                               │
│  ──────────────────────────────────────────────                             │
│  - Integrate Lambda into pr-preview.yml                                     │
│  - Write s3-notifier.values.yaml to cluster-gitops                          │
│  - cluster-gitops workflow triggers Terraform                               │
│  - Minimal changes to existing infrastructure                               │
│                                                                             │
│  Phase 2 (Future): ArgoCD Sync Hook Integration                             │
│  ────────────────────────────────────────────────                           │
│  - Replace workflow trigger with ArgoCD sync hook                           │
│  - Terraform runs as K8s Job coordinated by ArgoCD                          │
│  - Unified sync wave ordering (K8s + Lambda)                                │
│  - Single orchestrator (ArgoCD) for all resources                           │
│                                                                             │
│  Phase 3 (Long-term): ACK Controller                                        │
│  ─────────────────────────────────────────                                  │
│  - Migrate from Terraform to ACK Lambda Controller                          │
│  - Lambda becomes native K8s resource                                       │
│  - True GitOps with continuous reconciliation                               │
│  - Only if AWS becomes more central to infrastructure                       │
└─────────────────────────────────────────────────────────────────────────────┘

Revised Recommendation: Full GitOps Alignment

After analyzing how K8s previews work, the hybrid approach is no longer recommended. Instead, Lambda previews should follow the same pattern as K8s previews:

┌─────────────────────────────────────────────────────────────────────────────┐
│              RECOMMENDED: FULL GITOPS ALIGNMENT FOR ALL PREVIEWS            │
│                                                                             │
│  K8s Previews (current):                                                    │
│  ───────────────────────                                                    │
│  pr-preview.yml → writes files to cluster-gitops → ArgoCD deploys           │
│                                                                             │
│  Lambda Previews (proposed):                                                │
│  ──────────────────────────                                                 │
│  pr-preview-lambda.yml → writes s3-notifier.values.yaml to cluster-gitops   │
│                       → cluster-gitops workflow triggers                    │
│                       → Terraform deploys                                   │
│                                                                             │
│  Benefits:                                                                  │
│  ✓ Consistent pattern across ALL preview resources                          │
│  ✓ Single source of truth (cluster-gitops/syrf/environments/preview/pr-N/) │
│  ✓ Complete preview visibility (K8s + Lambda in one place)                  │
│  ✓ Unified cleanup (delete pr-N/ folder = delete everything)               │
│  ✓ Acceptable latency (~10-12 min, faster than K8s ~15 min)                 │
│                                                                             │
│  Trade-offs (acceptable):                                                   │
│  ⚠️ Cross-repo coordination (same as K8s previews)                          │
│  ⚠️ More git commits for preview changes (same as K8s previews)             │
│  ⚠️ Distributed state (same complexity as K8s previews)                     │
└─────────────────────────────────────────────────────────────────────────────┘

Key insight: The considerations documented above apply equally to K8s previews, yet we accept them for K8s. Lambda should be no different.

Historical Context: Why the Analysis Changed

The original "Hybrid Approach" was based on comparing Lambda GitOps against an idealized "direct deploy" baseline. After understanding how K8s previews actually work, the comparison baseline changed:

Metric Lambda Direct Lambda GitOps K8s GitOps (current)
Latency ~5-8 min ~10-12 min ~15-17 min
Git commits 0 ~2/PR ~2/PR
Complexity Simple Moderate Moderate
Consistency ❌ Inconsistent ✅ Consistent ✅ Consistent
Visibility ❌ None ✅ Full ✅ Full

Lambda with GitOps is actually faster than K8s previews and achieves consistency.


Summary

Aspect Current Proposed (Option 1)
Version source of truth CI/CD workflow cluster-gitops config.yaml
Build trigger Push to main Push to main (unchanged)
Deploy trigger Same workflow cluster-gitops PR merge
Artifact storage GitHub Releases GitHub Releases (unchanged)
Deployment mechanism GitHub Actions + Terraform GitHub Actions + Terraform (unchanged)
Reconciliation None (one-shot) Event-driven (PR merge)
Drift correction None None (acceptable for Lambda)

The key change is decoupling build from deploy and making cluster-gitops the trigger point for Lambda deployments, consistent with the Kubernetes service pattern.


Detailed Implementation Plan

Step 1: Create Lambda Config Files in cluster-gitops

Files to create:

# cluster-gitops/syrf/services/s3-notifier/config.yaml
serviceName: s3-notifier
deploymentType: lambda  # Distinguishes from K8s services
chartPath: null
chartRepo: null
# cluster-gitops/syrf/environments/staging/s3-notifier/config.yaml
serviceName: s3-notifier
envName: staging
lambda:
  version: "0.1.5"
  functionName: "syrfAppUploadS3Notifier-staging"
  s3TriggerPrefix: "staging/"
gitVersion:
  sha: "..."
  shortSha: "..."
# cluster-gitops/syrf/environments/production/s3-notifier/config.yaml
serviceName: s3-notifier
envName: production
lambda:
  version: "0.1.4"
  functionName: "syrfAppUploadS3Notifier"
  s3TriggerPrefix: "Projects/"
gitVersion:
  sha: "..."
  shortSha: "..."

Step 2: Create Lambda Deploy Workflow in cluster-gitops

File: cluster-gitops/.github/workflows/lambda-deploy.yml

Triggers on changes to syrf/environments/*/s3-notifier/config.yaml.

Workflow steps: 1. Detect which environment(s) changed (staging, production) 2. Read Lambda version from config.yaml 3. Download .zip from GitHub Release (s3-notifier-v{version}) 4. Upload to S3 (lambda-packages/staging.zip or production.zip) 5. Checkout camarades-infrastructure repo 6. Run Terraform with environment-specific variables

Step 3: Create Staging Lambda in Terraform (PREREQUISITE)

⚠️ Critical: Staging Lambda does NOT currently exist. This step must be completed FIRST.

Modify: camarades-infrastructure/terraform/lambda/main.tf

Add new staging Lambda function:

# Staging Lambda function
resource "aws_lambda_function" "staging" {
  function_name = "syrfAppUploadS3Notifier-staging"
  role          = aws_iam_role.lambda_role.arn
  runtime       = "dotnet8"
  handler       = "SyRF.S3FileSavedNotifier.Endpoint::SyRF.S3FileSavedNotifier.Endpoint.Function::FunctionHandler"

  s3_bucket         = var.lambda_bucket
  s3_key            = "lambda-packages/staging.zip"
  source_code_hash  = var.staging_source_code_hash

  memory_size = 512
  timeout     = 30

  environment {
    variables = {
      RabbitMqHost = var.staging_rabbitmq_host  # Staging-specific
      # ... other staging env vars
    }
  }
}

# S3 trigger permission for staging
resource "aws_lambda_permission" "staging_s3" {
  statement_id  = "AllowS3InvokeStaging"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.staging.function_name
  principal     = "s3.amazonaws.com"
  source_arn    = aws_s3_bucket.uploads.arn
}

Update S3 notifications:

resource "aws_s3_bucket_notification" "bucket_notification" {
  bucket = aws_s3_bucket.uploads.id

  # Production notification (existing)
  lambda_function {
    lambda_function_arn = aws_lambda_function.production.arn
    events              = ["s3:ObjectCreated:*"]
    filter_prefix       = "Projects/"
  }

  # Staging notification (NEW)
  lambda_function {
    lambda_function_arn = aws_lambda_function.staging.arn
    events              = ["s3:ObjectCreated:*"]
    filter_prefix       = "staging/"
  }

  # Preview notifications (existing, via dynamic block)
  dynamic "lambda_function" {
    for_each = var.preview_prs
    content {
      lambda_function_arn = aws_lambda_function.preview[lambda_function.value].arn
      events              = ["s3:ObjectCreated:*"]
      filter_prefix       = "preview/pr-${lambda_function.value}/"
    }
  }
}

Add staging variables to variables.tf:

variable "staging_version" {
  description = "Semantic version for staging Lambda"
  type        = string
  default     = ""
}

variable "staging_commit_sha" {
  description = "Git commit SHA for staging deployment"
  type        = string
  default     = ""
}

variable "staging_source_code_hash" {
  description = "Base64-encoded SHA256 hash of staging.zip"
  type        = string
  default     = ""
}

variable "staging_rabbitmq_host" {
  description = "RabbitMQ host for staging environment"
  type        = string
}

Production Lambda remains unchanged: - Function name: syrfAppUploadS3Notifier - S3 package: lambda-packages/production.zip - S3 trigger prefix: Projects/

Step 4: Modify ci-cd.yml to Decouple Build from Deploy

Changes to syrf/.github/workflows/ci-cd.yml:

Remove from deploy-lambda job: - Upload to S3 lambda-packages/production.zip - Terraform init/plan/apply

Keep: - Build Lambda package (.zip) - Create GitHub artifact - Create git tag (separate job) - Create GitHub Release with .zip attachment

Add to promote-to-staging job: - Update syrf/environments/staging/s3-notifier/config.yaml with new version - Include in same PR as K8s service promotions

Step 5: S3 Bucket Prefix Structure

Current:

syrfapp-uploads/
├── Projects/           ← Production (current)
└── preview/pr-{n}/     ← PR previews (current)

New:

syrfapp-uploads/
├── Projects/           ← Production (unchanged)
├── staging/            ← Staging (new)
└── preview/pr-{n}/     ← PR previews (unchanged)

Note: API/Web may need configuration to upload to correct prefix per environment.


Implementation Order (Dependencies)

┌─────────────────────────────────────────────────────────────────────────────┐
│                           IMPLEMENTATION PHASES                              │
└─────────────────────────────────────────────────────────────────────────────┘

Phase 1: Infrastructure Preparation (camarades-infrastructure)
─────────────────────────────────────────────────────────────
1. Add staging Lambda function to Terraform
2. Add staging S3 notification trigger
3. Add staging variables
4. Apply Terraform to create staging Lambda
   └─► Staging Lambda now exists but is empty (no package yet)

Phase 2: GitOps Structure (cluster-gitops)
───────────────────────────────────────────
5. Create s3-notifier service config
6. Create staging/production config.yaml files
7. Create lambda-deploy.yml workflow
   └─► Workflow ready to deploy on config changes

Phase 3: CI/CD Decoupling (syrf)
────────────────────────────────
8. Modify ci-cd.yml to remove direct deploy
9. Add Lambda version to staging promotion PR
   └─► Build creates artifact, GitOps triggers deploy

Phase 4: Verification
─────────────────────
10. Test end-to-end: code change → build → promotion → deploy
11. Verify rollback capability

Files to Modify

Phase 1: Infrastructure (camarades-infrastructure repo)

File Changes Required
terraform/lambda/main.tf Add aws_lambda_function.staging resource
Add staging S3 trigger to aws_s3_bucket_notification
Add aws_lambda_permission.staging_s3
terraform/lambda/variables.tf Add staging_version, staging_commit_sha, staging_source_code_hash, staging_rabbitmq_host
terraform/lambda/outputs.tf Add staging Lambda ARN output

Phase 2: GitOps Structure (cluster-gitops repo)

File Changes Required
syrf/services/s3-notifier/config.yaml Create - service metadata with deploymentType: lambda
syrf/environments/staging/s3-notifier/config.yaml Create - staging version declaration
syrf/environments/production/s3-notifier/config.yaml Create - production version declaration
.github/workflows/lambda-deploy.yml Create - GitOps-triggered deployment workflow

Phase 3: CI/CD Changes (syrf repo)

File Changes Required
.github/workflows/ci-cd.yml Remove from deploy-lambda: S3 upload, Terraform apply
Keep in deploy-lambda: Build package, create artifact
Modify promote-to-staging: Include Lambda config update
Modify promote-to-production: Include Lambda config update

Verification Steps

Pre-Implementation Checks

  1. Verify current state:
    # List existing Lambda functions
    aws lambda list-functions --query "Functions[?starts_with(FunctionName, 'syrfApp')].[FunctionName,LastModified]" --output table
    
    # Confirm staging Lambda does NOT exist
    aws lambda get-function --function-name syrfAppUploadS3Notifier-staging 2>&1 | grep -q "ResourceNotFoundException" && echo "Confirmed: Staging Lambda does not exist"
    

Phase 1: Infrastructure Verification

  1. After Terraform apply:
    # Verify staging Lambda created
    aws lambda get-function --function-name syrfAppUploadS3Notifier-staging
    
    # Verify S3 notifications include staging
    aws s3api get-bucket-notification-configuration --bucket syrfappuploads
    

Phase 3: End-to-End Tests

  1. Test build-only flow:
  2. Push s3-notifier change to main
  3. Verify: GitHub Release created with .zip
  4. Verify: No direct Lambda deployment (Terraform step skipped)
  5. Verify: PR created to cluster-gitops with version update in staging config

  6. Test GitOps deployment:

  7. Merge promotion PR to cluster-gitops
  8. Verify: lambda-deploy.yml triggers
  9. Verify: Staging Lambda updated in AWS

    aws lambda get-function-configuration --function-name syrfAppUploadS3Notifier-staging \
      --query '{Version: Environment.Variables.VERSION, LastModified: LastModified}'
    

  10. Test production promotion:

  11. Create PR updating production config.yaml
  12. Manual review and merge
  13. Verify: Production Lambda updated

  14. Test rollback:

  15. Update staging config.yaml to previous version (e.g., 0.1.40.1.3)
  16. Verify: Lambda reverts to that version
  17. Verify: GitHub Release for target version still exists (artifact available)

  18. Test file upload triggers:

  19. Upload test file to staging/ prefix
  20. Verify: Staging Lambda invoked
  21. Upload test file to Projects/ prefix
  22. Verify: Production Lambda invoked (unchanged behavior)

Implementation Status

This document is DOCUMENTATION ONLY for a future PR.

No implementation will be done in the current session. This document serves as: 1. Architectural design and rationale 2. Analysis of deployment mechanism differences (ci-cd.yml vs pr-preview-lambda.yml) 3. Detailed reasoning for why GitOps is appropriate for staging/production but NOT for previews 4. Implementation roadmap for when this work is prioritized


Future PR Checklist

When implementing this feature, create PRs in this order:

  • PR 1 (camarades-infrastructure): Add staging Lambda to Terraform
  • Add aws_lambda_function.staging resource
  • Add staging S3 notification trigger
  • Add staging variables
  • Test with terraform plan

  • PR 2 (cluster-gitops): Add GitOps structure for Lambda

  • Create syrf/services/s3-notifier/config.yaml
  • Create syrf/environments/staging/s3-notifier/config.yaml
  • Create syrf/environments/production/s3-notifier/config.yaml
  • Create .github/workflows/lambda-deploy.yml

  • PR 3 (syrf): Decouple build from deploy in ci-cd.yml

  • Remove Terraform apply from deploy-lambda job
  • Add Lambda version to staging promotion PR
  • Update promote-to-production to include Lambda

  • PR 4 (all repos): End-to-end verification

  • Test build-only flow
  • Test GitOps deployment to staging
  • Test production promotion
  • Test rollback capability