Lambda GitOps Versioning - Architecture Design¶

SUPERSEDED: This document has been superseded by Lambda GitOps Integration. The analysis and options in this document informed the final consolidated decision. This document is preserved for historical reference only.

Problem Statement¶

Currently, there's an architectural asymmetry in how service versions are managed:

Service Type	Source of Truth	Deployment Mechanism	Reconciliation
Kubernetes services	`cluster-gitops/syrf/environments/{env}/{svc}/config.yaml`	ArgoCD	Continuous
Lambda (S3 Notifier)	CI/CD workflow builds and deploys directly	GitHub Actions + Terraform	One-shot

Goal: Make cluster-gitops the single source of truth for Lambda versions, with deployment triggered by version changes in that repo.

Current Lambda Deployment Flow¶

┌──────────────────────────────────────────────────────────────────────────┐
│                        CURRENT STATE (Direct Deploy)                      │
└──────────────────────────────────────────────────────────────────────────┘

1. Push to main branch (src/services/s3-notifier/)
        │
        ▼
2. ci-cd.yml workflow triggers
        │
        ├─► GitVersion calculates version (e.g., 0.1.5)
        │
        ├─► dotnet publish → production.zip
        │
        ├─► Upload to S3: lambda-packages/production.zip
        │
        ├─► Terraform apply (camarades-infrastructure/terraform/lambda/)
        │
        ├─► Create git tag: s3-notifier-v0.1.5
        │
        └─► Create GitHub Release with .zip attachment

No version declaration in cluster-gitops - version is computed and deployed in one step.

Current Deployment Mechanism Comparison¶

Production Lambda (ci-cd.yml)¶

Trigger: Push to main affecting src/services/s3-notifier/

Key Characteristics:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         PRODUCTION DEPLOYMENT FLOW                           │
└─────────────────────────────────────────────────────────────────────────────┘

1. Build & Package
   └─► dotnet lambda package → production.zip

2. Upload to S3
   └─► aws s3 cp production.zip s3://camarades-terraform-state-aws/lambda-packages/production.zip
       (FIXED path - always overwrites)

3. Terraform Apply
   ├─► Variables:
   │   - TF_VAR_production_commit_sha = "abc123..."
   │   - TF_VAR_production_source_code_hash = base64(sha256(production.zip))
   └─► Resources:
       - aws_lambda_function.production (syrfAppUploadS3Notifier)
       - S3 notification on prefix: Projects/

4. Concurrency Control
   └─► concurrency group: production-lambda-terraform (serialized deploys)

Terraform Resources (single function, fixed config): - Function: syrfAppUploadS3Notifier - S3 Source: lambda-packages/production.zip - S3 Trigger Prefix: Projects/ - Environment Variables: Production RabbitMQ host, etc.

Preview Lambdas (pr-preview-lambda.yml)¶

Trigger: PR labeled with preview, synchronize, or close events

Key Characteristics:

┌─────────────────────────────────────────────────────────────────────────────┐
│                          PREVIEW DEPLOYMENT FLOW                             │
└─────────────────────────────────────────────────────────────────────────────┘

1. Collect Active PRs
   └─► GitHub API: List all PRs with 'preview' label
       Output: ["123", "456", "789"]

2. Build Per-PR Package
   └─► dotnet lambda package → pr-{number}.zip

3. Upload Per-PR Package
   └─► aws s3 cp pr-{number}.zip s3://camarades-terraform-state-aws/lambda-packages/pr-{number}.zip
       (DYNAMIC path per PR)

4. Terraform Apply with for_each
   ├─► Variables:
   │   - TF_VAR_preview_prs = ["123", "456", "789"]
   │   - TF_VAR_preview_commit_shas = {"123": "abc...", "456": "def..."}
   │   - TF_VAR_preview_versions = {"123": "0.1.5-pr123", "456": "0.1.5-pr456"}
   └─► Resources (via for_each):
       - aws_lambda_function.preview["123"] → syrfAppUploadS3Notifier-pr-123
       - aws_lambda_function.preview["456"] → syrfAppUploadS3Notifier-pr-456
       - S3 notifications per prefix: preview/pr-{n}/

5. Cleanup on PR Close
   └─► Remove PR from preview_prs set → Terraform destroys Lambda

Terraform Pattern (for_each for dynamic resources):

# Preview Lambda functions - one per PR
resource "aws_lambda_function" "preview" {
  for_each      = var.preview_prs  # set(string): ["123", "456"]
  function_name = "syrfAppUploadS3Notifier-pr-${each.key}"
  s3_bucket     = var.lambda_bucket
  s3_key        = "lambda-packages/pr-${each.key}.zip"
  # ...
}

# S3 notifications with dynamic block
resource "aws_s3_bucket_notification" "bucket_notification" {
  # Production notification (always)
  lambda_function {
    lambda_function_arn = aws_lambda_function.production.arn
    filter_prefix       = "Projects/"
  }

  # Preview notifications (dynamic per PR)
  dynamic "lambda_function" {
    for_each = var.preview_prs
    content {
      lambda_function_arn = aws_lambda_function.preview[lambda_function.value].arn
      filter_prefix       = "preview/pr-${lambda_function.value}/"
    }
  }
}

Critical Finding: No Staging Lambda Exists¶

Current State Analysis:

Environment	Lambda Function	S3 Prefix	Status
Production	`syrfAppUploadS3Notifier`	`Projects/`	✅ Exists
Preview	`syrfAppUploadS3Notifier-pr-{n}`	`preview/pr-{n}/`	✅ Exists (dynamic)
Staging	None	None	❌ Does NOT exist

Implication: Staging environment currently shares production Lambda, meaning: - Staging deployments don't test Lambda changes before production - No isolation between staging and production file processing - Adding staging Lambda is a prerequisite for GitOps versioning

Architectural Differences Summary¶

Aspect	Production (ci-cd.yml)	Preview (pr-preview-lambda.yml)
Lifecycle	Permanent	Ephemeral (PR lifecycle)
Function Count	1	N (one per PR)
Terraform Pattern	Single resource	`for_each` over set
S3 Package Path	`production.zip` (fixed)	`pr-{n}.zip` (dynamic)
Version Tracking	GitHub tag + Release	PR commit SHA
Cleanup	Never	Auto on PR close
Trigger	Push to main	PR label events

Key Variables in Terraform¶

# Production
variable "production_commit_sha" {
  description = "Git commit SHA for production deployment"
  type        = string
}

variable "production_source_code_hash" {
  description = "Base64-encoded SHA256 hash of production.zip"
  type        = string
}

# Preview (for_each pattern)
variable "preview_prs" {
  description = "Set of PR numbers with preview label"
  type        = set(string)
  default     = []
}

variable "preview_commit_shas" {
  description = "Map of PR number to commit SHA"
  type        = map(string)
  default     = {}
}

variable "preview_versions" {
  description = "Map of PR number to semantic version"
  type        = map(string)
  default     = {}
}

Desired State¶

┌──────────────────────────────────────────────────────────────────────────┐
│                     DESIRED STATE (GitOps Versioning)                     │
└──────────────────────────────────────────────────────────────────────────┘

cluster-gitops/syrf/environments/
├── staging/
│   └── s3-notifier/
│       └── config.yaml          ◄─── lambdaVersion: 0.1.5
│                                      (single source of truth)
└── production/
    └── s3-notifier/
        └── config.yaml          ◄─── lambdaVersion: 0.1.4
                                       (manually promoted)

Change to config.yaml → triggers → Lambda deployment

Design Options¶

Option 1: GitOps-Triggered Terraform (Recommended)¶

Architecture: cluster-gitops PR merge → GitHub Actions → Terraform apply

┌─────────────────────────────────────────────────────────────────────────────┐
│                            OPTION 1: GitOps-Triggered Terraform             │
└─────────────────────────────────────────────────────────────────────────────┘

Phase A: Build & Release (unchanged from current)
─────────────────────────────────────────────────
1. Push to main (s3-notifier code changes)
2. ci-cd.yml workflow:
   ├─► GitVersion → 0.1.6
   ├─► Build Lambda package (.zip)
   ├─► Create git tag: s3-notifier-v0.1.6
   └─► Create GitHub Release with .zip attachment

   ❌ Does NOT deploy to Lambda
   ❌ Does NOT upload to S3 lambda-packages/
   ✓  Only creates versioned artifact

Phase B: Promote to Staging (GitOps)
────────────────────────────────────
3. ci-cd.yml creates PR to cluster-gitops:
   - Updates: syrf/environments/staging/s3-notifier/config.yaml
   - Sets: lambdaVersion: "0.1.6"

4. PR auto-merges (staging)

5. cluster-gitops webhook → lambda-deploy.yml workflow
   ├─► Detect which environment changed (staging)
   ├─► Download .zip from GitHub Release (s3-notifier-v0.1.6)
   ├─► Upload to S3: lambda-packages/staging.zip
   └─► Terraform apply with:
       - TF_VAR_staging_version=0.1.6
       - TF_VAR_staging_source_code_hash=<calculated>

Phase C: Promote to Production (Manual Gate)
────────────────────────────────────────────
6. After staging verification, trigger production promotion
7. Creates PR to cluster-gitops:
   - Updates: syrf/environments/production/s3-notifier/config.yaml
   - Sets: lambdaVersion: "0.1.6"

8. Manual review & merge

9. Same workflow as step 5, but for production environment

File Structure in cluster-gitops:

# syrf/environments/staging/s3-notifier/config.yaml
serviceName: s3-notifier
envName: staging
lambda:
  version: "0.1.6"
  functionName: "syrfAppUploadS3Notifier"  # or staging-specific name
  s3Prefix: "Projects/"
gitVersion:
  sha: "abc123"
  shortSha: "abc123"

Terraform Changes Required: - Refactor to support staging/production as separate Lambda functions (or same function with version alias) - Accept version and S3 package path as variables - Support Lambda aliases for blue/green deployment

Pros: - ✅ Leverages existing Terraform Lambda management - ✅ Uses pre-built packages from GitHub Releases - ✅ Consistent promotion PR pattern with K8s services - ✅ No new operators or controllers - ✅ Clear audit trail (version change → PR → deployment)

Cons: - ⚠️ Not continuous reconciliation (event-driven, not polling) - ⚠️ Cross-repo workflow triggers add complexity - ⚠️ Lambda drift won't be auto-corrected (manual intervention needed)

Option 2: AWS Controllers for Kubernetes (ACK)¶

Architecture: Lambda managed as Kubernetes CRD, reconciled by ACK operator

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION 2: AWS Controllers for Kubernetes                  │
└─────────────────────────────────────────────────────────────────────────────┘

1. Install ACK Lambda Controller in GKE cluster
   - Requires IAM role for service account (IRSA) or GKE Workload Identity → AWS
   - Controller watches for Lambda CRDs

2. Define Lambda as Kubernetes manifest in cluster-gitops:

   # syrf/environments/staging/s3-notifier/lambda.yaml
   apiVersion: lambda.services.k8s.aws/v1alpha1
   kind: Function
   metadata:
     name: syrf-s3-notifier-staging
     namespace: syrf-staging
   spec:
     name: syrfAppUploadS3Notifier-staging
     runtime: dotnet8
     handler: "SyRF.S3FileSavedNotifier.Endpoint::..."
     code:
       s3Bucket: camarades-terraform-state-aws
       s3Key: lambda-packages/staging-0.1.6.zip  # ← Version in filename
     environment:
       variables:
         RabbitMqHost: "amqp://..."
     memorySize: 512
     timeout: 30

3. ArgoCD syncs Lambda CRD to cluster

4. ACK Lambda Controller reconciles:
   - Creates/updates AWS Lambda function to match spec
   - Continuous reconciliation (drift correction)
   - Status reflected back to K8s resource

5. S3 bucket notifications still managed separately (Terraform or ACK S3 controller)

Pros: - ✅ True GitOps with continuous reconciliation - ✅ Automatic drift correction - ✅ Consistent with K8s patterns (everything is a CRD) - ✅ ArgoCD manages Lambda just like other resources

Cons: - ⚠️ Requires ACK Lambda controller installation and maintenance - ⚠️ Complex cross-cloud IAM (GKE → AWS) - ⚠️ Different from current Terraform approach (migration effort) - ⚠️ S3 bucket notifications need separate management - ⚠️ ACK is AWS-specific; adds AWS dependency to K8s cluster

Option 3: Flux Terraform Controller¶

Architecture: Terraform execution managed as Kubernetes resource

┌─────────────────────────────────────────────────────────────────────────────┐
│                      OPTION 3: Flux Terraform Controller                     │
└─────────────────────────────────────────────────────────────────────────────┘

1. Install Flux Terraform Controller (tf-controller)
   - Runs Terraform inside K8s pods
   - Manages Terraform state

2. Define Terraform resource in cluster-gitops:

   # syrf/environments/staging/s3-notifier/terraform.yaml
   apiVersion: infra.contrib.fluxcd.io/v1alpha2
   kind: Terraform
   metadata:
     name: s3-notifier-staging
     namespace: flux-system
   spec:
     approvePlan: auto
     interval: 10m
     path: ./terraform/lambda
     sourceRef:
       kind: GitRepository
       name: camarades-infrastructure
     vars:
       - name: environment
         value: staging
       - name: lambda_version
         value: "0.1.6"  # ← Version declared here
     varsFrom:
       - kind: Secret
         name: lambda-terraform-vars

3. ArgoCD (or Flux) syncs Terraform CRD

4. Terraform Controller:
   - Clones camarades-infrastructure
   - Runs terraform plan/apply
   - Reconciles on interval (10m)

Pros: - ✅ Kubernetes-native Terraform execution - ✅ Drift detection via interval reconciliation - ✅ Keeps Terraform as deployment mechanism - ✅ Secrets management via K8s secrets

Cons: - ⚠️ Another controller to install and maintain - ⚠️ Terraform runs in cluster (security implications) - ⚠️ Mixing GitOps tools (ArgoCD + Flux component) - ⚠️ More complex than GitHub Actions approach

Option 4: Lambda Versioning with Aliases¶

Architecture: Single Lambda function with version aliases, managed via GitOps

┌─────────────────────────────────────────────────────────────────────────────┐
│                     OPTION 4: Lambda Aliases (Traffic Shifting)              │
└─────────────────────────────────────────────────────────────────────────────┘

Instead of separate staging/production functions, use Lambda aliases:

Lambda Function: syrfAppUploadS3Notifier
├── $LATEST (always latest deployed code)
├── Version 42 (published version for v0.1.5)
├── Version 43 (published version for v0.1.6)
├── Alias: staging  → points to Version 43
└── Alias: production → points to Version 42

cluster-gitops declares which version each alias points to:

# syrf/environments/staging/s3-notifier/config.yaml
lambda:
  alias: staging
  version: 43  # Lambda published version number

S3 bucket notifications route by prefix:
- Projects/staging/* → staging alias
- Projects/* → production alias

Pros: - ✅ Single function with multiple "environments" - ✅ Instant rollback (just change alias) - ✅ Traffic shifting possible (gradual rollout) - ✅ AWS-native blue/green deployment

Cons: - ⚠️ S3 notifications routing becomes complex - ⚠️ Current architecture assumes separate prefixes per PR, not aliases - ⚠️ Requires rethinking S3 trigger architecture

Recommendation: Option 1 (GitOps-Triggered Terraform)¶

Rationale:

Minimal disruption: Uses existing Terraform, GitHub Releases, S3 backend
Familiar patterns: Matches current K8s promotion workflow
No new operators: Avoids ACK/Flux complexity
Sufficient for use case: Lambda versions change infrequently (releases, not continuous)
Clear separation of concerns:
cluster-gitops: Declares desired version
GitHub Actions: Orchestrates deployment
Terraform: Manages Lambda infrastructure
GitHub Releases: Stores versioned artifacts

Key Insight: Lambda doesn't need continuous reconciliation like Kubernetes pods. Version changes are discrete events (releases), not continuous drift. Event-driven deployment (PR merge → workflow) is appropriate.

Implementation Plan for Option 1¶

Phase 1: Refactor CI/CD to Separate Build from Deploy¶

Current: ci-cd.yml builds AND deploys Lambda in one workflow Target: Build creates artifact only; deploy triggered by cluster-gitops change

Changes to ci-cd.yml: 1. Keep: GitVersion, build, create tag, create GitHub Release 2. Remove: Upload to S3, Terraform apply 3. Add: Update cluster-gitops staging config with new version

Phase 2: Create Lambda Deploy Workflow in cluster-gitops¶

New workflow: cluster-gitops/.github/workflows/lambda-deploy.yml

Triggers on: - Push to main affecting syrf/environments/*/s3-notifier/config.yaml

Steps: 1. Determine which environment(s) changed 2. Read version from config.yaml 3. Download .zip from GitHub Release (s3-notifier-v{version}) 4. Upload to S3 (environment-specific path) 5. Checkout camarades-infrastructure 6. Run Terraform with environment and version variables

Phase 3: Refactor Terraform for Multi-Environment¶

Current: Single production Lambda, preview Lambdas dynamic Target: Staging + Production as separate functions (or aliases)

Options: - A) Separate functions: syrfAppUploadS3Notifier-staging, syrfAppUploadS3Notifier-production - B) Single function with aliases: staging and production aliases pointing to versions

Recommend Option A initially for simplicity, can migrate to aliases later.

Phase 4: Add config.yaml for Lambda in cluster-gitops¶

Create files:

syrf/environments/staging/s3-notifier/config.yaml
syrf/environments/production/s3-notifier/config.yaml

Schema:

serviceName: s3-notifier
envName: staging
lambda:
  version: "0.1.6"
  functionName: "syrfAppUploadS3Notifier-staging"
  s3TriggerPrefix: "Projects/"
gitVersion:
  sha: "abc123def456..."
  shortSha: "abc123"
deploymentNotification:
  commitSha: "abc123def456..."

Design Decisions (User Input)¶

Question	Decision	Rationale
Environment Isolation	Separate functions	`syrfAppUploadS3Notifier-staging` and `syrfAppUploadS3Notifier` (production)
AWS Accounts	Same account	Simpler credential management
Preview Lambdas	Full GitOps alignment	Match K8s preview pattern for consistency

Preview Lambda Strategy (Revised)¶

After analyzing K8s preview patterns, preview Lambdas should follow the same GitOps pattern as K8s services:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PREVIEW LAMBDA GITOPS PATTERN                            │
│                                                                             │
│  1. pr-preview-lambda.yml builds Lambda package                             │
│                                                                             │
│  2. pr-preview-lambda.yml writes config to cluster-gitops:                  │
│     syrf/environments/preview/pr-{n}/services/s3-notifier.values.yaml       │
│                                                                             │
│  3. cluster-gitops lambda-deploy.yml triggers on file change                │
│                                                                             │
│  4. Terraform applies with preview_prs set derived from files               │
│                                                                             │
│  5. Cleanup: delete pr-{n}/ folder → triggers Lambda destruction            │
└─────────────────────────────────────────────────────────────────────────────┘

Why this changed: K8s previews use the exact same pattern (workflow creates files → ArgoCD deploys). Lambda should be no different for architectural consistency. See "Kubernetes Preview Environment Pattern" and "Revised Recommendation" sections for detailed analysis.

Kubernetes Preview Environment Pattern (For Comparison)¶

Understanding how K8s previews work is essential for deciding Lambda preview strategy.

K8s Preview Deployment Architecture¶

┌─────────────────────────────────────────────────────────────────────────────┐
│                     K8S PREVIEW DEPLOYMENT FLOW                              │
└─────────────────────────────────────────────────────────────────────────────┘

1. PR labeled with "preview"
         │
         ▼
2. pr-preview.yml workflow triggers
         │
         ├─► Build Docker images with pr-{number} tag
         │
         └─► write-versions job creates files in cluster-gitops:
             │
             └─► syrf/environments/preview/pr-{N}/
                 ├── pr.yaml                    ◄─── ApplicationSet trigger
                 ├── namespace.yaml             ◄─── K8s Namespace definition
                 ├── mongodb-user.yaml          ◄─── Database isolation
                 ├── db-reset-job.yaml          ◄─── PreSync data cleanup
                 └── services/
                     ├── api.values.yaml        ◄─── Service image tags
                     ├── project-management.values.yaml
                     ├── quartz.values.yaml
                     └── web.values.yaml
                             │
                             ▼
3. ArgoCD ApplicationSet detects pr.yaml
         │
         ├─► Uses matrix generator: PR × services
         │
         └─► Generates Applications:
             ├── pr-{N}-namespace
             ├── pr-{N}-api
             ├── pr-{N}-project-management
             ├── pr-{N}-quartz
             └── pr-{N}-web
                     │
                     ▼
4. ArgoCD syncs Applications (auto-sync enabled)
         │
         └─► Deploys via Helm charts at PR commit SHA
                     │
                     ▼
5. Cleanup on PR close
         │
         └─► workflow deletes pr-{N}/ folder → ArgoCD deletes Apps → resources cleaned up

Key K8s Preview Characteristics¶

Aspect	K8s Implementation
Trigger	pr-preview.yml creates files in cluster-gitops
Orchestration	ArgoCD ApplicationSet (declarative)
Config Location	`cluster-gitops/syrf/environments/preview/pr-{N}/`
Deployment	ArgoCD syncs Helm charts from monorepo
Namespace Isolation	Per-PR K8s namespace
Database Isolation	Per-PR MongoDB database + AtlasDatabaseUser
Version Tracking	GitVersion in `{service}.values.yaml`
Cleanup	Delete pr-{N}/ folder → cascading deletion

ApplicationSet Configuration¶

# argocd/applicationsets/syrf-previews.yaml (simplified)
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
spec:
  generators:
    - matrix:
        generators:
          # Trigger: pr-*/pr.yaml existence
          - git:
              files:
                - path: "syrf/environments/preview/pr-*/pr.yaml"
          # Services to deploy per PR
          - merge:
              generators:
                - git:
                    files:
                      - path: "syrf/services/*/config.yaml"
                - git:
                    files:
                      - path: "syrf/environments/preview/services/*/config.yaml"

Values File Hierarchy (Priority Order)¶

1. syrf/global.values.yaml                                     # Universal defaults
2. syrf/services/{svc}/values.yaml                             # Service base
3. syrf/environments/preview/preview.values.yaml               # Preview defaults (ALL)
4. syrf/environments/preview/services/{svc}/values.yaml        # Preview per-service
5. syrf/environments/preview/pr-{N}/services/{svc}.values.yaml # PR-SPECIFIC (HIGHEST)

Critical Insight: K8s Previews ARE GitOps¶

The K8s preview pattern is fully GitOps-driven: 1. Workflow creates config files in cluster-gitops 2. ArgoCD detects file changes via Git generators 3. ApplicationSet generates Applications automatically 4. ArgoCD handles deployment, sync, and cleanup

The workflow triggers GitOps, it doesn't deploy directly.

Lambda vs K8s Preview Pattern Comparison¶

Aspect	K8s Previews	Lambda Previews (Current)
Config in cluster-gitops	✅ Yes - pr.yaml, values files	❌ No - managed in workflow
Deployment orchestrator	ArgoCD ApplicationSet	GitHub Actions + Terraform
Trigger mechanism	File existence → ArgoCD	Workflow → Terraform directly
State visibility	Full visibility in cluster-gitops	No visibility (ephemeral)
Cleanup mechanism	Delete folder → ArgoCD cascades	Workflow removes from TF set
Pattern consistency	GitOps-first	Imperative-first

The Inconsistency Problem¶

Currently, Lambda previews operate differently from K8s previews:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        CURRENT INCONSISTENCY                                 │
└─────────────────────────────────────────────────────────────────────────────┘

K8s Preview:
  pr-preview.yml → creates files in cluster-gitops → ArgoCD deploys
                   ↑ GitOps is the deployment mechanism

Lambda Preview:
  pr-preview-lambda.yml → Terraform apply directly → Lambda deployed
                          ↑ Workflow is the deployment mechanism
                          (cluster-gitops not involved)

Revised Analysis: Should Lambda Previews Follow K8s Pattern?¶

Option A: Full GitOps Alignment (K8s Pattern)¶

Make Lambda previews follow the same pattern as K8s:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION A: FULL GITOPS ALIGNMENT                          │
└─────────────────────────────────────────────────────────────────────────────┘

1. pr-preview-lambda.yml creates:
   cluster-gitops/syrf/environments/preview/pr-{N}/services/s3-notifier.values.yaml

2. cluster-gitops workflow (lambda-deploy.yml) triggers on file change

3. Workflow runs Terraform with values from config file

4. Cleanup: delete s3-notifier.values.yaml → triggers Lambda destruction

Pros: - ✅ Consistent pattern with K8s previews - ✅ Full visibility in cluster-gitops - ✅ Lambda preview state matches K8s preview state - ✅ Single source of truth for ALL preview resources

Cons: - ⚠️ Adds ~3-5 min latency (cross-repo workflow) - ⚠️ More complex coordination (two workflows) - ⚠️ Still imperative deployment (Terraform), just triggered differently

Option B: Hybrid Pattern (Current + Visibility)¶

Keep imperative deployment but add cluster-gitops visibility:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION B: HYBRID (VISIBILITY ONLY)                       │
└─────────────────────────────────────────────────────────────────────────────┘

1. pr-preview-lambda.yml deploys Lambda directly (current behavior)

2. AFTER successful deploy, update cluster-gitops:
   cluster-gitops/syrf/environments/preview/pr-{N}/services/s3-notifier.values.yaml

3. cluster-gitops is READ-ONLY for Lambda previews (not a trigger)

4. Cleanup: workflow deletes Lambda AND updates cluster-gitops

Pros: - ✅ Fast deployment (no cross-repo wait) - ✅ Visibility in cluster-gitops - ✅ Simple failure recovery (single workflow)

Cons: - ⚠️ Inconsistent pattern (K8s: GitOps-triggers-deploy, Lambda: deploy-then-update) - ⚠️ cluster-gitops may drift if workflow fails after deploy - ⚠️ Two sources of truth (workflow state + cluster-gitops)

Option C: No Change (Status Quo)¶

Keep Lambda previews completely separate:

Pros: - ✅ Simplest approach - ✅ Already working

Cons: - ❌ No visibility into Lambda preview state - ❌ Inconsistent with K8s pattern - ❌ Can't see complete preview environment in cluster-gitops

Recommendation: Option A (Full GitOps Alignment)¶

After deeper analysis, Option A is recommended for Lambda previews because:

1. Consistency Principle¶

K8s previews already accept the "workflow creates files → GitOps deploys" pattern. Lambda should follow the same pattern for architectural consistency.

2. The Latency Concern is Overstated¶

K8s Preview Latency Analysis:

pr-preview.yml workflow:
├── detect-changes: ~1 min
├── version-*: ~2 min
├── build-and-push: ~5-10 min
├── write-versions: ~1 min (creates cluster-gitops files)
└── ArgoCD sync: ~2-3 min

Total: ~12-17 minutes

Lambda with GitOps Latency Analysis:

pr-preview-lambda.yml workflow:
├── detect-changes: ~1 min
├── version: ~1 min
├── build Lambda package: ~2 min
├── write config to cluster-gitops: ~1 min
└── cluster-gitops workflow triggers: ~2 min
    └── Terraform apply: ~3-5 min

Total: ~10-12 minutes

Conclusion: Lambda with GitOps is actually FASTER than K8s previews because Lambda builds are faster than Docker builds. The latency concern from earlier analysis was comparing against an idealized "immediate deploy" that K8s also doesn't achieve.

3. Single Source of Truth¶

With Option A, cluster-gitops/syrf/environments/preview/pr-{N}/ contains EVERYTHING about a preview: - K8s namespace configuration - MongoDB user configuration - ALL service image tags - Lambda version and configuration

This enables: - Complete preview state at a glance - Easier debugging (one place to look) - Consistent cleanup (delete folder = delete everything)

4. Terraform Pattern Adaptation¶

The for_each concern can be addressed:

# Current: Workflow passes set
variable "preview_prs" {
  type    = set(string)
  default = []
}

# GitOps: Workflow discovers set from cluster-gitops files
# In lambda-deploy.yml workflow:
PREVIEW_PRS=$(find syrf/environments/preview/pr-*/services/s3-notifier.values.yaml \
  -exec dirname {} \; | xargs -I{} basename {} | sed 's/pr-//')

# Pass to Terraform:
TF_VAR_preview_prs='["123", "456"]'

Considerations: GitOps for Preview Lambdas¶

This section documents the trade-offs of applying GitOps to preview Lambdas. After analyzing the K8s preview pattern, Option A (Full GitOps Alignment) is now recommended, but these considerations should inform implementation decisions.

1. Feedback Loop Latency¶

Current Preview Flow (~5-8 minutes):

Push to PR → Build Lambda → Upload to S3 → Terraform apply → Lambda ready
            └────────────── Single workflow, linear execution ──────────────┘

Hypothetical GitOps Flow (~15-25 minutes):

Push to PR → Build Lambda → Create PR to cluster-gitops → Wait for merge
                                        │
                                        ▼
                            cluster-gitops PR merged
                                        │
                                        ▼
                            lambda-deploy.yml triggers
                                        │
                                        ▼
                            Download artifact → Terraform apply → Lambda ready

Impact: Developers expect rapid feedback on PR changes. The additional cross-repo coordination adds 10-15 minutes to each iteration cycle, significantly degrading developer experience.

2. Git History Pollution¶

Problem: Preview environments are high-churn by nature.

cluster-gitops commit history (hypothetical GitOps previews):
─────────────────────────────────────────────────────────────
abc1234 - Update pr-456 s3-notifier to sha def789
bcd2345 - Update pr-123 s3-notifier to sha abc012
cde3456 - Update pr-456 s3-notifier to sha ghi345
def4567 - Remove pr-789 s3-notifier (PR closed)
efg5678 - Update pr-123 s3-notifier to sha jkl678
fgh6789 - Add pr-901 s3-notifier config
...
(dozens of commits per day for active development)

Impact: - Meaningful staging/production changes buried in preview noise - Git blame becomes useless for understanding intentional changes - Repository size grows rapidly with ephemeral config churn

3. Cross-Repository Coordination Complexity¶

Current Architecture (self-contained):

┌─────────────────────────────────────────────────────────────┐
│                    pr-preview-lambda.yml                     │
│                                                              │
│  1. Get list of preview PRs                                 │
│  2. Build Lambda for current PR                             │
│  3. Upload to S3                                            │
│  4. Terraform apply with preview_prs set                    │
│  5. On PR close: remove from set, Terraform destroys        │
│                                                              │
│  └─► All state managed within single workflow               │
└─────────────────────────────────────────────────────────────┘

GitOps Architecture (distributed state):

┌─────────────────────────────────────────────────────────────┐
│  syrf repo                    cluster-gitops repo            │
│  ──────────                   ───────────────────            │
│  pr-preview.yml ──creates PR──► preview/pr-{n}/config.yaml  │
│        │                              │                      │
│        │                              ▼                      │
│        │                      lambda-deploy.yml              │
│        │                              │                      │
│        │                              ▼                      │
│        │                      Terraform apply                │
│        │                                                     │
│  On PR close:                                               │
│  cleanup.yml ──creates PR──► Remove preview/pr-{n}/         │
│        │                              │                      │
│        │                              ▼                      │
│        │                      lambda-deploy.yml              │
│        │                              │                      │
│        │                              ▼                      │
│        └──────────────────── Terraform destroy               │
│                                                              │
│  Failure modes:                                             │
│  - cluster-gitops PR fails to merge → orphaned state        │
│  - cleanup PR fails → orphaned Lambda resources             │
│  - Race conditions between multiple PRs                     │
│  - Partial failures (K8s deployed, Lambda failed)           │
└─────────────────────────────────────────────────────────────┘

Impact: Distributed state across repositories creates multiple failure modes that don't exist with the current self-contained approach.

4. Terraform `for_each` Pattern Incompatibility¶

Current Pattern (works well):

# Workflow passes complete set of active PRs
variable "preview_prs" {
  type    = set(string)
  default = []  # e.g., ["123", "456", "789"]
}

resource "aws_lambda_function" "preview" {
  for_each = var.preview_prs
  # Terraform manages full lifecycle based on set membership
}

GitOps Challenge:

# How would this work?

Option A: Aggregate configs at deploy time
─────────────────────────────────────────
cluster-gitops/
└── syrf/environments/preview/
    ├── pr-123/s3-notifier/config.yaml
    ├── pr-456/s3-notifier/config.yaml
    └── pr-789/s3-notifier/config.yaml

lambda-deploy.yml would need to:
1. List all pr-* directories
2. Build preview_prs set from directory names
3. Pass to Terraform

Problem: What if a PR closes while workflow is running?
         Directory deleted mid-execution → race condition

Option B: Separate Terraform per preview
──────────────────────────────────────────
Each preview has isolated Terraform state

Problems:
- Expensive (separate state file per PR)
- S3 bucket notifications can't be split (single resource)
- Lose the elegance of for_each pattern

Impact: The current for_each pattern is designed for workflow-managed state, not file-based discovery. Retrofitting GitOps would require significant Terraform refactoring.

5. Ephemeral Resources Don't Need GitOps Benefits¶

GitOps Benefit	Value for Staging/Prod	Value for Previews
Single source of truth	✅ Critical - must know what's deployed	❌ PR commit IS the source of truth
Audit trail	✅ Critical - compliance, debugging	❌ Previews are throwaway
Manual gates	✅ Critical - prod approval	❌ No approval needed for preview
Rollback via git revert	✅ Useful - revert prod issues	❌ Just push new commit to PR
Drift detection	✅ Useful - ensure consistency	❌ Preview will be deleted anyway

Conclusion: GitOps overhead provides no meaningful benefit for ephemeral preview resources.

6. Circular Dependency Problem¶

┌─────────────────────────────────────────────────────────────┐
│                    CHICKEN-AND-EGG PROBLEM                   │
└─────────────────────────────────────────────────────────────┘

Q: When should cluster-gitops preview config be created?

Option A: Before PR exists
─────────────────────────
Can't create config without PR number
PR number assigned by GitHub when PR is created
→ Impossible

Option B: After PR created, before first build
─────────────────────────────────────────────
PR created → workflow creates cluster-gitops config → triggers deploy
                                        │
                                        └─► But deploy needs the artifact
                                            Artifact created by same workflow
                                            → Circular dependency

Option C: After artifact built
──────────────────────────────
PR created → build artifact → create cluster-gitops PR → deploy
                                        │
                                        └─► Adds latency (Option B problem)
                                            Every push updates cluster-gitops
                                            → Git pollution problem

Impact: There's no clean way to bootstrap preview configs without introducing either latency or complexity.

7. Failure Recovery Complexity¶

Current Approach (idempotent):

# If preview deployment fails, just re-run the workflow
# Workflow has complete state, can retry everything
gh workflow run pr-preview-lambda.yml

GitOps Approach (distributed state):

# If deployment fails, need to diagnose where:
# 1. Did syrf workflow fail to create cluster-gitops PR?
# 2. Did cluster-gitops PR fail to merge?
# 3. Did lambda-deploy.yml fail to trigger?
# 4. Did Terraform fail?

# Recovery requires:
# - Check cluster-gitops for pending PRs
# - Check if config exists but Lambda doesn't
# - Manually reconcile state between repos

Impact: Debugging and recovery become significantly more complex with distributed state.

Advanced Option: Unified Preview Workflow (Lambda + K8s)¶

Concept: Lambda as "Just Another Service" in pr-preview.yml¶

Instead of having separate workflows (pr-preview.yml for K8s, pr-preview-lambda.yml for Lambda), Lambda could be integrated into the same pr-preview.yml workflow as K8s services.

┌─────────────────────────────────────────────────────────────────────────────┐
│                  UNIFIED PREVIEW WORKFLOW (pr-preview.yml)                  │
│                                                                             │
│  Current (Separate):                                                        │
│  ───────────────────                                                        │
│  pr-preview.yml ──────► K8s services (Docker build → cluster-gitops → ArgoCD)
│  pr-preview-lambda.yml ► Lambda (dotnet build → Terraform directly)        │
│                                                                             │
│  Proposed (Unified):                                                        │
│  ──────────────────────                                                     │
│  pr-preview.yml ──────► ALL services:                                       │
│                         ├── K8s services (existing)                         │
│                         │   └── writes: pr-{n}/services/{svc}.values.yaml   │
│                         │                                                   │
│                         └── Lambda (new)                                    │
│                             ├── Build Lambda package (.zip)                 │
│                             ├── Upload to S3: lambda-packages/pr-{n}.zip    │
│                             └── writes: pr-{n}/services/s3-notifier.values.yaml
│                                                                             │
│  ArgoCD ApplicationSet generates Application for Lambda (like K8s services)│
│  Lambda Application syncs Terraform Job or ACK resource                     │
└─────────────────────────────────────────────────────────────────────────────┘

Benefits:¶

Single Workflow: One pr-preview.yml handles ALL preview resources
Consistent Pattern: Lambda config written to cluster-gitops same as K8s
Unified Timing: Lambda and K8s services deploy in coordinated sync waves
Single Cleanup: Delete pr-{n}/ folder cleans up EVERYTHING

Implementation in pr-preview.yml:¶

jobs:
  # Existing K8s jobs...
  build-and-push-images:
    # ... existing Docker builds

  # NEW: Add Lambda build alongside K8s builds
  build-lambda:
    if: needs.detect-changes.outputs.s3_notifier_changed == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup .NET
        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '8.0.x'

      - name: Install Lambda Tools
        run: dotnet tool install -g Amazon.Lambda.Tools

      - name: Build Lambda Package
        run: |
          cd src/services/s3-notifier/SyRF.S3FileSavedNotifier.Endpoint
          dotnet lambda package -o pr-${{ github.event.pull_request.number }}.zip

      - name: Upload to S3
        run: |
          aws s3 cp pr-${{ github.event.pull_request.number }}.zip \
            s3://camarades-terraform-state-aws/lambda-packages/pr-${{ github.event.pull_request.number }}.zip

  # MODIFIED: write-versions now includes Lambda
  write-versions:
    runs-on: ubuntu-latest
    steps:
      # ... existing K8s service values ...

      - name: Write Lambda values
        if: needs.build-lambda.result == 'success' || needs.retag-lambda.result == 'success'
        run: |
          cat > syrf/environments/preview/pr-${{ env.PR_NUMBER }}/services/s3-notifier.values.yaml <<EOF
          serviceName: s3-notifier
          deploymentType: lambda
          lambda:
            version: "${{ needs.version-s3-notifier.outputs.version }}"
            commitSha: "${{ needs.version-s3-notifier.outputs.sha }}"
            s3Key: "lambda-packages/pr-${{ env.PR_NUMBER }}.zip"
          gitVersion:
            sha: "${{ needs.version-s3-notifier.outputs.sha }}"
            shortSha: "${{ needs.version-s3-notifier.outputs.shortSha }}"
          EOF

Advanced Option: ArgoCD-Orchestrated Lambda Deployment¶

Concept: ArgoCD as Unified Orchestrator for K8s AND Lambda¶

Instead of GitHub Actions triggering Terraform directly, ArgoCD could coordinate Lambda deployment using one of these patterns:

Option A: ACK (AWS Controllers for Kubernetes)¶

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION A: ACK LAMBDA CONTROLLER                          │
│                                                                             │
│  cluster-gitops/syrf/environments/preview/pr-{n}/lambda/function.yaml       │
│  ────────────────────────────────────────────────────────────────────────── │
│  apiVersion: lambda.services.k8s.aws/v1alpha1                               │
│  kind: Function                                                             │
│  metadata:                                                                  │
│    name: s3-notifier-pr-123                                                 │
│    namespace: pr-123                                                        │
│  spec:                                                                      │
│    name: syrfAppUploadS3Notifier-pr-123                                     │
│    runtime: dotnet8                                                         │
│    handler: "SyRF.S3FileSavedNotifier..."                                   │
│    code:                                                                    │
│      s3Bucket: camarades-terraform-state-aws                                │
│      s3Key: lambda-packages/pr-123.zip                                      │
│                                                                             │
│  Flow:                                                                      │
│  pr-preview.yml → writes function.yaml → ArgoCD syncs → ACK creates Lambda  │
│                                                                             │
│  Pros:                                                                      │
│  ✓ True GitOps (ArgoCD is orchestrator)                                     │
│  ✓ Continuous reconciliation (drift correction)                             │
│  ✓ Lambda is just another K8s resource                                      │
│                                                                             │
│  Cons:                                                                      │
│  ⚠️ Requires ACK Lambda Controller installation                              │
│  ⚠️ Cross-cloud IAM (GKE Workload Identity → AWS IAM)                        │
│  ⚠️ S3 bucket notifications need separate controller                        │
│  ⚠️ Migration from Terraform                                                 │
└─────────────────────────────────────────────────────────────────────────────┘

Option B: ArgoCD Sync Hook + Terraform Job¶

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION B: ARGOCD SYNC HOOK + TERRAFORM JOB               │
│                                                                             │
│  cluster-gitops/syrf/environments/preview/pr-{n}/lambda/deploy-job.yaml     │
│  ────────────────────────────────────────────────────────────────────────── │
│  apiVersion: batch/v1                                                       │
│  kind: Job                                                                  │
│  metadata:                                                                  │
│    name: deploy-lambda-pr-123                                               │
│    annotations:                                                             │
│      argocd.argoproj.io/hook: Sync                                          │
│      argocd.argoproj.io/sync-wave: "5"  # After K8s services                │
│  spec:                                                                      │
│    template:                                                                │
│      spec:                                                                  │
│        serviceAccountName: terraform-runner                                 │
│        containers:                                                          │
│          - name: terraform                                                  │
│            image: hashicorp/terraform:1.6                                   │
│            env:                                                             │
│              - name: PR_NUMBER                                              │
│                value: "123"                                                 │
│              - name: AWS_REGION                                             │
│                value: eu-west-1                                             │
│            command: ["/bin/sh", "-c"]                                       │
│            args:                                                            │
│              - |                                                            │
│                git clone https://github.com/camaradesuk/camarades-infrastructure
│                cd camarades-infrastructure/terraform/lambda                 │
│                terraform init                                               │
│                terraform apply -var="preview_prs=[\"$PR_NUMBER\"]" -auto-approve
│                                                                             │
│  Flow:                                                                      │
│  pr-preview.yml → writes deploy-job.yaml → ArgoCD syncs → Job runs Terraform│
│                                                                             │
│  Sync Wave Ordering:                                                        │
│  Wave -10: ExternalSecret (Atlas API credentials)                           │
│  Wave -5:  AtlasDatabaseUser (creates MongoDB user)                         │
│  Wave -1:  db-reset Job (drops collections)                                 │
│  Wave 0:   K8s Deployments (API, PM, Web, etc.)                             │
│  Wave 5:   Lambda deploy Job (Terraform) ← NEW                              │
│                                                                             │
│  Pros:                                                                      │
│  ✓ ArgoCD orchestrates ALL resources                                        │
│  ✓ Uses existing Terraform                                                  │
│  ✓ Sync waves ensure correct ordering                                       │
│  ✓ No additional controllers                                                │
│                                                                             │
│  Cons:                                                                      │
│  ⚠️ Terraform runs as K8s Job (state management complexity)                  │
│  ⚠️ AWS credentials in K8s secrets                                           │
│  ⚠️ Job cleanup and retry handling                                           │
└─────────────────────────────────────────────────────────────────────────────┘

Option C: Flamingo (ArgoCD + Flux Terraform Controller)¶

┌─────────────────────────────────────────────────────────────────────────────┐
│                    OPTION C: FLAMINGO / FLUX TF CONTROLLER                  │
│                                                                             │
│  cluster-gitops/syrf/environments/preview/pr-{n}/lambda/terraform.yaml      │
│  ────────────────────────────────────────────────────────────────────────── │
│  apiVersion: infra.contrib.fluxcd.io/v1alpha2                               │
│  kind: Terraform                                                            │
│  metadata:                                                                  │
│    name: s3-notifier-pr-123                                                 │
│  spec:                                                                      │
│    approvePlan: auto                                                        │
│    interval: 5m                                                             │
│    path: ./terraform/lambda                                                 │
│    sourceRef:                                                               │
│      kind: GitRepository                                                    │
│      name: camarades-infrastructure                                         │
│    vars:                                                                    │
│      - name: preview_prs                                                    │
│        value: '["123"]'                                                     │
│                                                                             │
│  Flow:                                                                      │
│  pr-preview.yml → writes terraform.yaml → ArgoCD syncs → Flux TF applies    │
│                                                                             │
│  Pros:                                                                      │
│  ✓ ArgoCD orchestrates                                                      │
│  ✓ Proper Terraform state management                                        │
│  ✓ Continuous reconciliation                                                │
│                                                                             │
│  Cons:                                                                      │
│  ⚠️ Requires Flux Terraform Controller installation                          │
│  ⚠️ Mixing GitOps tools (ArgoCD + Flux)                                      │
│  ⚠️ More infrastructure complexity                                           │
└─────────────────────────────────────────────────────────────────────────────┘

Comparison: Lambda Deployment Approaches¶

Approach	Orchestrator	Lambda Deploy	State Management	Complexity	Recommendation
Current	GitHub Actions	Terraform (workflow)	Terraform state	Low	Status quo
GitOps Workflow Trigger	GitHub Actions	Terraform (workflow)	Terraform state	Medium	✅ Practical
ACK Controller	ArgoCD	ACK (K8s CRD)	ACK Controller	High	Future option
Sync Hook + TF Job	ArgoCD	Terraform (K8s Job)	Terraform state	Medium-High	✅ Best unification
Flamingo/Flux TF	ArgoCD	Flux TF Controller	Flux TF	High	Over-engineered

Recommended Evolution Path:¶

┌─────────────────────────────────────────────────────────────────────────────┐
│                         RECOMMENDED EVOLUTION PATH                          │
│                                                                             │
│  Phase 1 (Immediate): GitOps Workflow Trigger                               │
│  ──────────────────────────────────────────────                             │
│  - Integrate Lambda into pr-preview.yml                                     │
│  - Write s3-notifier.values.yaml to cluster-gitops                          │
│  - cluster-gitops workflow triggers Terraform                               │
│  - Minimal changes to existing infrastructure                               │
│                                                                             │
│  Phase 2 (Future): ArgoCD Sync Hook Integration                             │
│  ────────────────────────────────────────────────                           │
│  - Replace workflow trigger with ArgoCD sync hook                           │
│  - Terraform runs as K8s Job coordinated by ArgoCD                          │
│  - Unified sync wave ordering (K8s + Lambda)                                │
│  - Single orchestrator (ArgoCD) for all resources                           │
│                                                                             │
│  Phase 3 (Long-term): ACK Controller                                        │
│  ─────────────────────────────────────────                                  │
│  - Migrate from Terraform to ACK Lambda Controller                          │
│  - Lambda becomes native K8s resource                                       │
│  - True GitOps with continuous reconciliation                               │
│  - Only if AWS becomes more central to infrastructure                       │
└─────────────────────────────────────────────────────────────────────────────┘

Revised Recommendation: Full GitOps Alignment¶

After analyzing how K8s previews work, the hybrid approach is no longer recommended. Instead, Lambda previews should follow the same pattern as K8s previews:

┌─────────────────────────────────────────────────────────────────────────────┐
│              RECOMMENDED: FULL GITOPS ALIGNMENT FOR ALL PREVIEWS            │
│                                                                             │
│  K8s Previews (current):                                                    │
│  ───────────────────────                                                    │
│  pr-preview.yml → writes files to cluster-gitops → ArgoCD deploys           │
│                                                                             │
│  Lambda Previews (proposed):                                                │
│  ──────────────────────────                                                 │
│  pr-preview-lambda.yml → writes s3-notifier.values.yaml to cluster-gitops   │
│                       → cluster-gitops workflow triggers                    │
│                       → Terraform deploys                                   │
│                                                                             │
│  Benefits:                                                                  │
│  ✓ Consistent pattern across ALL preview resources                          │
│  ✓ Single source of truth (cluster-gitops/syrf/environments/preview/pr-N/) │
│  ✓ Complete preview visibility (K8s + Lambda in one place)                  │
│  ✓ Unified cleanup (delete pr-N/ folder = delete everything)               │
│  ✓ Acceptable latency (~10-12 min, faster than K8s ~15 min)                 │
│                                                                             │
│  Trade-offs (acceptable):                                                   │
│  ⚠️ Cross-repo coordination (same as K8s previews)                          │
│  ⚠️ More git commits for preview changes (same as K8s previews)             │
│  ⚠️ Distributed state (same complexity as K8s previews)                     │
└─────────────────────────────────────────────────────────────────────────────┘

Key insight: The considerations documented above apply equally to K8s previews, yet we accept them for K8s. Lambda should be no different.

Historical Context: Why the Analysis Changed¶

The original "Hybrid Approach" was based on comparing Lambda GitOps against an idealized "direct deploy" baseline. After understanding how K8s previews actually work, the comparison baseline changed:

Metric	Lambda Direct	Lambda GitOps	K8s GitOps (current)
Latency	~5-8 min	~10-12 min	~15-17 min
Git commits	0	~2/PR	~2/PR
Complexity	Simple	Moderate	Moderate
Consistency	❌ Inconsistent	✅ Consistent	✅ Consistent
Visibility	❌ None	✅ Full	✅ Full

Lambda with GitOps is actually faster than K8s previews and achieves consistency.

Summary¶

Aspect	Current	Proposed (Option 1)
Version source of truth	CI/CD workflow	cluster-gitops config.yaml
Build trigger	Push to main	Push to main (unchanged)
Deploy trigger	Same workflow	cluster-gitops PR merge
Artifact storage	GitHub Releases	GitHub Releases (unchanged)
Deployment mechanism	GitHub Actions + Terraform	GitHub Actions + Terraform (unchanged)
Reconciliation	None (one-shot)	Event-driven (PR merge)
Drift correction	None	None (acceptable for Lambda)

The key change is decoupling build from deploy and making cluster-gitops the trigger point for Lambda deployments, consistent with the Kubernetes service pattern.

Detailed Implementation Plan¶

Step 1: Create Lambda Config Files in cluster-gitops¶

Files to create:

# cluster-gitops/syrf/services/s3-notifier/config.yaml
serviceName: s3-notifier
deploymentType: lambda  # Distinguishes from K8s services
chartPath: null
chartRepo: null

# cluster-gitops/syrf/environments/staging/s3-notifier/config.yaml
serviceName: s3-notifier
envName: staging
lambda:
  version: "0.1.5"
  functionName: "syrfAppUploadS3Notifier-staging"
  s3TriggerPrefix: "staging/"
gitVersion:
  sha: "..."
  shortSha: "..."

# cluster-gitops/syrf/environments/production/s3-notifier/config.yaml
serviceName: s3-notifier
envName: production
lambda:
  version: "0.1.4"
  functionName: "syrfAppUploadS3Notifier"
  s3TriggerPrefix: "Projects/"
gitVersion:
  sha: "..."
  shortSha: "..."

Step 2: Create Lambda Deploy Workflow in cluster-gitops¶

File: cluster-gitops/.github/workflows/lambda-deploy.yml

Triggers on changes to syrf/environments/*/s3-notifier/config.yaml.

Workflow steps: 1. Detect which environment(s) changed (staging, production) 2. Read Lambda version from config.yaml 3. Download .zip from GitHub Release (s3-notifier-v{version}) 4. Upload to S3 (lambda-packages/staging.zip or production.zip) 5. Checkout camarades-infrastructure repo 6. Run Terraform with environment-specific variables

Step 3: Create Staging Lambda in Terraform (PREREQUISITE)¶

⚠️ Critical: Staging Lambda does NOT currently exist. This step must be completed FIRST.

Modify: camarades-infrastructure/terraform/lambda/main.tf

Add new staging Lambda function:

# Staging Lambda function
resource "aws_lambda_function" "staging" {
  function_name = "syrfAppUploadS3Notifier-staging"
  role          = aws_iam_role.lambda_role.arn
  runtime       = "dotnet8"
  handler       = "SyRF.S3FileSavedNotifier.Endpoint::SyRF.S3FileSavedNotifier.Endpoint.Function::FunctionHandler"

  s3_bucket         = var.lambda_bucket
  s3_key            = "lambda-packages/staging.zip"
  source_code_hash  = var.staging_source_code_hash

  memory_size = 512
  timeout     = 30

  environment {
    variables = {
      RabbitMqHost = var.staging_rabbitmq_host  # Staging-specific
      # ... other staging env vars
    }
  }
}

# S3 trigger permission for staging
resource "aws_lambda_permission" "staging_s3" {
  statement_id  = "AllowS3InvokeStaging"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.staging.function_name
  principal     = "s3.amazonaws.com"
  source_arn    = aws_s3_bucket.uploads.arn
}

Update S3 notifications:

resource "aws_s3_bucket_notification" "bucket_notification" {
  bucket = aws_s3_bucket.uploads.id

  # Production notification (existing)
  lambda_function {
    lambda_function_arn = aws_lambda_function.production.arn
    events              = ["s3:ObjectCreated:*"]
    filter_prefix       = "Projects/"
  }

  # Staging notification (NEW)
  lambda_function {
    lambda_function_arn = aws_lambda_function.staging.arn
    events              = ["s3:ObjectCreated:*"]
    filter_prefix       = "staging/"
  }

  # Preview notifications (existing, via dynamic block)
  dynamic "lambda_function" {
    for_each = var.preview_prs
    content {
      lambda_function_arn = aws_lambda_function.preview[lambda_function.value].arn
      events              = ["s3:ObjectCreated:*"]
      filter_prefix       = "preview/pr-${lambda_function.value}/"
    }
  }
}

Add staging variables to variables.tf:

variable "staging_version" {
  description = "Semantic version for staging Lambda"
  type        = string
  default     = ""
}

variable "staging_commit_sha" {
  description = "Git commit SHA for staging deployment"
  type        = string
  default     = ""
}

variable "staging_source_code_hash" {
  description = "Base64-encoded SHA256 hash of staging.zip"
  type        = string
  default     = ""
}

variable "staging_rabbitmq_host" {
  description = "RabbitMQ host for staging environment"
  type        = string
}

Production Lambda remains unchanged: - Function name: syrfAppUploadS3Notifier - S3 package: lambda-packages/production.zip - S3 trigger prefix: Projects/

Step 4: Modify ci-cd.yml to Decouple Build from Deploy¶

Changes to syrf/.github/workflows/ci-cd.yml:

Remove from deploy-lambda job: - Upload to S3 lambda-packages/production.zip - Terraform init/plan/apply

Keep: - Build Lambda package (.zip) - Create GitHub artifact - Create git tag (separate job) - Create GitHub Release with .zip attachment

Add to promote-to-staging job: - Update syrf/environments/staging/s3-notifier/config.yaml with new version - Include in same PR as K8s service promotions

Step 5: S3 Bucket Prefix Structure¶

Current:

syrfapp-uploads/
├── Projects/           ← Production (current)
└── preview/pr-{n}/     ← PR previews (current)

New:

syrfapp-uploads/
├── Projects/           ← Production (unchanged)
├── staging/            ← Staging (new)
└── preview/pr-{n}/     ← PR previews (unchanged)

Note: API/Web may need configuration to upload to correct prefix per environment.

Implementation Order (Dependencies)¶

┌─────────────────────────────────────────────────────────────────────────────┐
│                           IMPLEMENTATION PHASES                              │
└─────────────────────────────────────────────────────────────────────────────┘

Phase 1: Infrastructure Preparation (camarades-infrastructure)
─────────────────────────────────────────────────────────────
1. Add staging Lambda function to Terraform
2. Add staging S3 notification trigger
3. Add staging variables
4. Apply Terraform to create staging Lambda
   └─► Staging Lambda now exists but is empty (no package yet)

Phase 2: GitOps Structure (cluster-gitops)
───────────────────────────────────────────
5. Create s3-notifier service config
6. Create staging/production config.yaml files
7. Create lambda-deploy.yml workflow
   └─► Workflow ready to deploy on config changes

Phase 3: CI/CD Decoupling (syrf)
────────────────────────────────
8. Modify ci-cd.yml to remove direct deploy
9. Add Lambda version to staging promotion PR
   └─► Build creates artifact, GitOps triggers deploy

Phase 4: Verification
─────────────────────
10. Test end-to-end: code change → build → promotion → deploy
11. Verify rollback capability

Files to Modify¶

Phase 1: Infrastructure (camarades-infrastructure repo)¶

File	Changes Required
`terraform/lambda/main.tf`	Add `aws_lambda_function.staging` resource
	Add staging S3 trigger to `aws_s3_bucket_notification`
	Add `aws_lambda_permission.staging_s3`
`terraform/lambda/variables.tf`	Add `staging_version`, `staging_commit_sha`, `staging_source_code_hash`, `staging_rabbitmq_host`
`terraform/lambda/outputs.tf`	Add staging Lambda ARN output

Phase 2: GitOps Structure (cluster-gitops repo)¶

File	Changes Required
`syrf/services/s3-notifier/config.yaml`	Create - service metadata with `deploymentType: lambda`
`syrf/environments/staging/s3-notifier/config.yaml`	Create - staging version declaration
`syrf/environments/production/s3-notifier/config.yaml`	Create - production version declaration
`.github/workflows/lambda-deploy.yml`	Create - GitOps-triggered deployment workflow

Phase 3: CI/CD Changes (syrf repo)¶

File	Changes Required
`.github/workflows/ci-cd.yml`	Remove from `deploy-lambda`: S3 upload, Terraform apply
	Keep in `deploy-lambda`: Build package, create artifact
	Modify `promote-to-staging`: Include Lambda config update
	Modify `promote-to-production`: Include Lambda config update

Verification Steps¶

Pre-Implementation Checks¶

Verify current state:

# List existing Lambda functions
aws lambda list-functions --query "Functions[?starts_with(FunctionName, 'syrfApp')].[FunctionName,LastModified]" --output table

# Confirm staging Lambda does NOT exist
aws lambda get-function --function-name syrfAppUploadS3Notifier-staging 2>&1 | grep -q "ResourceNotFoundException" && echo "Confirmed: Staging Lambda does not exist"

Phase 1: Infrastructure Verification¶

After Terraform apply:

# Verify staging Lambda created
aws lambda get-function --function-name syrfAppUploadS3Notifier-staging

# Verify S3 notifications include staging
aws s3api get-bucket-notification-configuration --bucket syrfappuploads

Phase 3: End-to-End Tests¶

Test build-only flow:
Push s3-notifier change to main
Verify: GitHub Release created with .zip
Verify: No direct Lambda deployment (Terraform step skipped)
Verify: PR created to cluster-gitops with version update in staging config
Test GitOps deployment:
Merge promotion PR to cluster-gitops
Verify: lambda-deploy.yml triggers

Verify: Staging Lambda updated in AWS

aws lambda get-function-configuration --function-name syrfAppUploadS3Notifier-staging \
  --query '{Version: Environment.Variables.VERSION, LastModified: LastModified}'

Test production promotion:
Create PR updating production config.yaml
Manual review and merge
Verify: Production Lambda updated
Test rollback:
Update staging config.yaml to previous version (e.g., 0.1.4 → 0.1.3)
Verify: Lambda reverts to that version
Verify: GitHub Release for target version still exists (artifact available)
Test file upload triggers:
Upload test file to staging/ prefix
Verify: Staging Lambda invoked
Upload test file to Projects/ prefix
Verify: Production Lambda invoked (unchanged behavior)

Implementation Status¶

This document is DOCUMENTATION ONLY for a future PR.

No implementation will be done in the current session. This document serves as: 1. Architectural design and rationale 2. Analysis of deployment mechanism differences (ci-cd.yml vs pr-preview-lambda.yml) 3. Detailed reasoning for why GitOps is appropriate for staging/production but NOT for previews 4. Implementation roadmap for when this work is prioritized

Future PR Checklist¶

When implementing this feature, create PRs in this order:

PR 1 (camarades-infrastructure): Add staging Lambda to Terraform
Add aws_lambda_function.staging resource
Add staging S3 notification trigger
Add staging variables
Test with terraform plan
PR 2 (cluster-gitops): Add GitOps structure for Lambda
Create syrf/services/s3-notifier/config.yaml
Create syrf/environments/staging/s3-notifier/config.yaml
Create syrf/environments/production/s3-notifier/config.yaml
Create .github/workflows/lambda-deploy.yml
PR 3 (syrf): Decouple build from deploy in ci-cd.yml
Remove Terraform apply from deploy-lambda job
Add Lambda version to staging promotion PR
Update promote-to-production to include Lambda
PR 4 (all repos): End-to-end verification
Test build-only flow
Test GitOps deployment to staging
Test production promotion
Test rollback capability