Cluster GitOps Repository Structure Critique¶
Executive Summary¶
This document provides a comprehensive architectural critique of the cluster-gitops repository structure, identifying inconsistencies, areas of concern, and recommendations for improvement. The repository shows evidence of recent refactoring and evolution, with some transitional artifacts and documentation drift.
Overall Assessment: The GitOps architecture is fundamentally sound with ApplicationSets and multi-source Helm patterns, but suffers from: - Inconsistent documentation - Orphaned directories from refactoring - Complex nested structure that may confuse new contributors - Multiple competing patterns not fully consolidated
Priority: Address documentation drift and remove orphaned artifacts before onboarding additional team members.
1. Structural Issues¶
1.1. Duplicate ApplicationSet Directories ⚠️ HIGH PRIORITY¶
Issue: Two ApplicationSet directories exist with different implementations:
cluster-gitops/
├── argo/applicationsets/ # ← Orphaned? (4.6k syrf.yaml)
│ └── syrf.yaml # Uses merge generator pattern
└── argocd/applicationsets/ # ← ACTIVE (4.0k syrf.yaml)
├── syrf.yaml # Uses matrix generator pattern
├── custom-charts.yaml
└── plugins.yaml
Analysis:
- argocd/bootstrap/root.yaml (L32) references argocd/applicationsets/
- The argo/ directory appears to be an orphaned experiment or transitional artifact
- The two syrf.yaml files implement different generator strategies:
- argo/: Merge generator (combines base + environment configs)
- argocd/: Matrix generator (simpler, environment-first approach)
Git History Evidence:
a479d8b - refactor(argocd): implement merge generator pattern in ApplicationSet
12078a5 - refactor(argocd): reorganize ArgoCD configuration structure
d487860 - chore(gitops): remove obsolete configuration files
Recent commits show active refactoring but incomplete cleanup.
Impact:
- Confusion: Which ApplicationSet is the "real" one?
- Drift Risk: If both are maintained, they'll diverge
- Documentation Mismatch: README doesn't mention argo/ directory at all
Recommendation:
1. Determine which implementation is canonical (likely argocd/)
2. Delete argo/ directory entirely if obsolete
3. If argo/ contains experimental work, move to a feature branch
4. Update CLAUDE.md to document the decision
Priority: HIGH - This is confusing and error-prone
1.2. Documentation vs Reality Mismatch ⚠️ HIGH PRIORITY¶
Issue: README.md describes a repository structure that doesn't match the actual implementation.
README Claims (lines 38-106):
cluster-gitops/
├── applicationsets/ # ← Doesn't exist (should be argocd/applicationsets/)
│ ├── syrf.yaml
│ ├── infrastructure.yaml # ← Doesn't exist
│ └── custom-charts.yaml
├── syrf/
│ ├── api/ # ← WRONG: Actually syrf/services/api/
│ │ ├── config.yaml
│ │ ├── values.yaml
│ │ ├── values-staging.yaml
│ │ └── values-production.yaml
├── infrastructure/ # ← Doesn't exist
│ ├── cert-manager/
│ ├── ingress-nginx/
│ └── ...
├── environments/ # ← WRONG: Actually syrf/environments/
│ ├── staging/
│ │ ├── namespace.yaml
│ │ ├── shared-values.yaml
│ │ └── syrf.yaml # ← Doesn't exist
Actual Structure:
cluster-gitops/
├── argocd/applicationsets/ # ← Not documented in README
├── syrf/
│ ├── services/ # ← Extra nesting level
│ │ ├── api/
│ │ │ ├── config.yaml
│ │ │ └── values.yaml
│ │ └── ...
│ └── environments/ # ← Under syrf/, not top-level
│ ├── staging/
│ │ ├── namespace.yaml
│ │ ├── shared-values.yaml
│ │ ├── api/ # ← Per-service env configs
│ │ │ ├── config.yaml
│ │ │ └── values.yaml
│ │ └── ...
├── plugins/ # ← Infrastructure is here
│ ├── helm/ # ← Called "plugins" not "infrastructure"
│ └── local/ # ← Not mentioned in README
Impact: - New team members will be confused - Documentation can't be trusted - CI/CD automation may reference wrong paths - Tutorial examples won't work
Examples of Path Mismatches:
| README Path | Actual Path | Status |
|---|---|---|
applicationsets/syrf.yaml |
argocd/applicationsets/syrf.yaml |
❌ Wrong |
syrf/api/config.yaml |
syrf/services/api/config.yaml |
❌ Wrong |
environments/staging/syrf.yaml |
Does not exist | ❌ Missing |
infrastructure/cert-manager/ |
plugins/helm/cert-manager/ |
❌ Wrong |
Recommendation: 1. Option A: Update README to match reality 2. Option B: Restructure repository to match README 3. Recommended: Option A (less disruptive, README is newer) 4. Add path validation tests to prevent future drift
Priority: HIGH - Documented paths don't work
1.3. Missing Infrastructure ApplicationSet 🔍 MEDIUM PRIORITY¶
Issue: README mentions infrastructure.yaml ApplicationSet but it doesn't exist.
README Reference (line 44):
applicationsets/
├── syrf.yaml # SyRF services (auto-generated Applications)
├── infrastructure.yaml # Infrastructure components ← MISSING
└── custom-charts.yaml # Custom Helm charts
What Exists Instead:
- argocd/applicationsets/plugins.yaml - Manages infrastructure via plugins pattern
- plugins/helm/* - Individual infrastructure components (cert-manager, ingress-nginx, etc.)
Analysis: The infrastructure components ARE being managed, but via the "plugins" pattern instead of a dedicated "infrastructure" ApplicationSet. This is semantically confusing:
- Plugins typically means optional extensions
- Infrastructure means required platform components
Infrastructure components like cert-manager, ingress-nginx, and external-dns are not optional - they're core platform requirements.
Recommendation:
1. Option A: Rename plugins → infrastructure throughout
2. Option B: Keep plugins but update documentation to clarify terminology
3. Option C: Create a separate infrastructure.yaml ApplicationSet
Recommended: Option A (clearer semantics)
Changes Required:
- Rename plugins/ → infrastructure/
- Rename applicationsets/plugins.yaml → applicationsets/infrastructure.yaml
- Rename AppProject from plugins → infrastructure
- Update all documentation
Priority: MEDIUM - Confusing naming but functionally works
1.4. Inconsistent Service Configuration Model 🔍 MEDIUM PRIORITY¶
Issue: Service configuration is split across multiple locations with unclear precedence.
Current Model:
syrf/
├── services/{service}/
│ ├── config.yaml # Base metadata (chartPath, chartRepo)
│ └── values.yaml # Shared Helm values (all environments)
└── environments/{env}/{service}/
├── config.yaml # Environment-specific (enabled, chartTag)
└── values.yaml # Environment-specific Helm values
Example - API Service:
-
syrf/services/api/config.yaml: -
syrf/environments/staging/api/config.yaml:
Problems:
- Duplication:
serviceNameappears in both files - Merge Complexity: Requires merge generator to combine
- Unclear Ownership: Which file is the "source of truth"?
- CI/CD Confusion: Where should automation update fields?
Value File Hierarchy (6 levels!):
1. Chart defaults (in syrf repo)
2. global/values.yaml (all services, all envs)
3. environments/{env}/shared-values.yaml (all services in env)
4. syrf/services/{service}/config.yaml
5. syrf/environments/{env}/{service}/config.yaml
6. syrf/services/{service}/values.yaml
7. syrf/environments/{env}/shared-values.yaml
8. syrf/environments/{env}/{service}/values.yaml
From ApplicationSet (argo/applicationsets/syrf.yaml, lines 82-88):
valueFiles:
- $values/global/values.yaml
- $values/syrf/services/{{.serviceName}}/config.yaml
- $values/syrf/environments/{{.envName}}/{{.serviceName}}/config.yaml
- $values/syrf/services/{{.serviceName}}/values.yaml
- $values/syrf/environments/{{.envName}}/shared-values.yaml
- $values/syrf/environments/{{.envName}}/{{.serviceName}}/values.yaml
Impact: - High cognitive load for new developers - Debugging "where does this value come from?" is difficult - Potential for conflicting values across files - Merge conflicts more likely during concurrent updates
Alternative Model (Simpler):
syrf/environments/{env}/{service}.yaml
├── Service metadata (chartPath, chartRepo)
├── Deployment config (enabled, chartTag)
└── Helm values (merged into values)
Single file per environment per service - much simpler!
Recommendation:
1. Evaluate if base service configs (syrf/services/*/) provide real value
2. If values are always environment-specific, consolidate into env files
3. If base configs are needed, document clear guidelines for what goes where
4. Consider reducing from 6 levels to 3-4 levels
Priority: MEDIUM - Works but complex
1.5. Plugins Directory Naming Inconsistency 🔍 LOW PRIORITY¶
Issue: ApplicationSet expects plugins/git/*/config.yaml but only plugins/local/* exists.
ApplicationSet (argocd/applicationsets/plugins.yaml, lines 21-26):
# Git-based chart plugins
- git:
repoURL: https://github.com/camaradesuk/cluster-gitops
revision: main
files:
- path: "plugins/git/*/config.yaml" # ← Expects git/
Actual Directory:
$ ls -la /home/chris/workspace/syrf/cluster-gitops/plugins/
drwxr-xr-x helm/
drwxr-xr-x local/ # ← Should be git/ ?
Analysis:
- The ApplicationSet is looking for Git-based charts in plugins/git/
- Only plugins/local/ exists (contains rabbitmq-secrets)
- This generator will match zero files and silently fail
Recommendation:
1. Rename plugins/local/ → plugins/git/
2. OR update ApplicationSet to match plugins/local/*/config.yaml
3. Document the naming convention and its purpose
Priority: LOW - Appears unused currently but will break if used
2. Design Concerns¶
2.1. Multi-Source Helm Pattern Complexity 🔍 INFO¶
Observation: The ApplicationSet uses ArgoCD's multi-source pattern to separate Helm charts from values.
Pattern:
sources:
# Source 1: Helm chart from monorepo at specific tag
- repoURL: 'https://github.com/camaradesuk/syrf'
targetRevision: '{{.service.chartTag}}' # e.g., api-v9.1.1
path: 'src/services/api/.chart'
helm:
valueFiles:
- $values/global/values.yaml
- $values/syrf/services/api/values.yaml
# ... more value files ...
# Source 2: Values from cluster-gitops repo
- repoURL: 'https://github.com/camaradesuk/cluster-gitops'
targetRevision: main
ref: values
Benefits: - ✅ Helm charts versioned with application code - ✅ Values decoupled in GitOps repo - ✅ Single source of truth for versions (chartTag) - ✅ Enables independent chart versioning
Concerns: - ⚠️ Complex to understand for ArgoCD beginners - ⚠️ Requires two git repositories for every deployment - ⚠️ Value file resolution across repos is opaque - ⚠️ Difficult to test locally without ArgoCD
Recommendation: - Document this pattern clearly with diagrams - Provide local testing workflow (e.g., Helm template command) - Consider adding troubleshooting guide for common issues
Priority: INFO - Document better, pattern is intentional
2.2. Environment Service Config Enabling Mechanism 🔍 MEDIUM PRIORITY¶
Issue: Multiple ways to enable/disable services creates ambiguity.
Method 1: Selector in ApplicationSet (argo/applicationsets/syrf.yaml, lines 44-52):
selector:
matchExpressions:
- key: service.enabled
operator: In
values: ['true']
- key: service.chartTag
operator: NotIn
values: ['', null]
Method 2: Config file flag (syrf/environments/staging/api/config.yaml):
Questions:
1. What happens if enabled: false but chartTag exists?
2. What happens if enabled: true but chartTag is empty?
3. Which takes precedence?
Current Behavior (from selector):
- Both conditions must be true (AND logic)
- Service disabled if either enabled: false OR chartTag is empty/null
Recommendation: 1. Document the enabling logic clearly 2. Consider simplifying to single mechanism 3. Add validation to prevent invalid states
Priority: MEDIUM - Can cause confusion
2.3. Missing Test/Validation Tooling 🔍 MEDIUM PRIORITY¶
Issue: No automated validation for ApplicationSet templates or config files.
Risks Without Validation: - Invalid YAML syntax in config files - Missing required fields (serviceName, chartTag, etc.) - Incorrect value file paths - Template rendering errors
Recommendation: Create validation tooling:
#!/bin/bash
# scripts/validate-gitops.sh
# 1. YAML syntax validation
echo "Validating YAML syntax..."
fd -e yaml -e yml -x yamllint {}
# 2. Required fields validation
echo "Checking required fields..."
for config in syrf/environments/*/*/config.yaml; do
yq eval '.serviceName' "$config" > /dev/null || echo "ERROR: Missing serviceName in $config"
yq eval '.service.chartTag' "$config" > /dev/null || echo "ERROR: Missing chartTag in $config"
done
# 3. ApplicationSet template rendering (dry-run)
echo "Testing ApplicationSet template rendering..."
argocd appset generate plugins -f argocd/applicationsets/plugins.yaml
# 4. Path existence validation
echo "Validating referenced paths..."
# Check that all chartPaths exist in syrf monorepo
# Check that all valueFiles paths exist
echo "✅ Validation complete"
Integration: - Add to GitHub Actions CI - Run as pre-commit hook - Include in PR preview environments
Priority: MEDIUM - Prevents deployment failures
3. Documentation Issues¶
3.1. Obsolete Deployment Guide 🔍 HIGH PRIORITY¶
Issue: docs/deploying-services.md references old structure and manual workflows.
Obsolete Content (lines 50-77):
# Example: Deploy API v1.2.3 to staging
nano environments/staging/api.values.yaml # ← File doesn't exist
image:
repository: ghcr.io/camaradesuk/syrf-api
tag: "1.2.3" # Update this line # ← Wrong approach
Reality:
- No environments/staging/api.values.yaml file
- CI/CD automatically updates syrf/environments/staging/api/config.yaml
- Field updated is service.chartTag, not image.tag
Recommendation: 1. Rewrite deployment guide to match current automated workflow 2. Remove manual deployment instructions (obsolete) 3. Add section on emergency manual rollbacks 4. Include troubleshooting for automation failures
Priority: HIGH - Misleading documentation causes errors
3.2. Missing Architecture Diagrams 🔍 MEDIUM PRIORITY¶
Issue: No visual diagrams explaining: - ApplicationSet generator flow - Value file precedence hierarchy - Deployment workflow (CI/CD → GitOps → K8s) - Service discovery mechanism
Recommendation: Create diagrams using Mermaid (GitHub renders it):
flowchart TD
A[CI/CD builds image] --> B[Creates git tag: api-v9.1.1]
B --> C[Updates chartTag in cluster-gitops]
C --> D[GitHub webhook → ArgoCD]
D --> E[ApplicationSet regenerates Application]
E --> F[ArgoCD syncs to K8s]
F --> G[K8s pulls new image from GHCR]
Priority: MEDIUM - Improves onboarding
4. Best Practices Assessment¶
4.1. What's Working Well ✅¶
Strong Points:
- GitOps Principles
- Single source of truth in git
- Declarative configuration
-
Full audit trail via git history
-
ApplicationSet Automation
- Auto-discovery of services
- DRY principle (one ApplicationSet → many Applications)
-
Environment-specific configurations
-
AppProject RBAC
- Clear environment boundaries
- Separation of staging/production permissions
-
Sync windows for production safety
-
Multi-Source Helm
- Charts versioned with code
- Values decoupled for GitOps
-
Flexible value precedence
-
PR Preview Environments
- Ephemeral testing environments
- Auto-cleanup on PR close
- Safe experimentation
4.2. Alignment with ArgoCD Best Practices¶
Checklist:
| Best Practice | Status | Notes |
|---|---|---|
| App-of-Apps pattern | ✅ Yes | argocd/bootstrap/root.yaml |
| AppProjects for RBAC | ✅ Yes | Staging, production, plugins |
| Automated sync for non-prod | ✅ Yes | Staging auto-syncs |
| Manual sync for production | ⚠️ Partial | Sync allowed but automated |
| Sync windows | ⚠️ Partial | Defined but not enforced |
| Resource hooks | ✅ Yes | PostSync for notifications |
| Multi-source apps | ✅ Yes | Charts + values separation |
| Secrets management | ❓ Unknown | External Secrets Operator? |
| Progressive delivery | ❌ No | No Argo Rollouts integration |
Recommendations: 1. Enforce production sync windows (currently set but automation bypasses) 2. Investigate secrets management setup 3. Consider Argo Rollouts for canary deployments
5. Security Observations¶
5.1. Overly Permissive Plugins AppProject ⚠️ MEDIUM PRIORITY¶
Issue: The plugins AppProject uses wildcard permissions.
From README (lines 406-410):
The `plugins` AppProject uses wildcard permissions (`group: '*', kind: '*'`)
which may appear overly permissive. However, this is necessary and secured
through defense-in-depth...
Analysis: - ✅ Mitigations documented (manual sync, RBAC, namespace restrictions) - ⚠️ Still risky if ArgoCD is compromised - ⚠️ No resource type restrictions (can create any K8s resource)
Recommendation: 1. Audit what resources plugins actually need 2. Tighten to specific resource types if possible 3. Add OPA policies to validate plugin manifests 4. Regular security audits of plugin configs
Priority: MEDIUM - Defense-in-depth is good, but tightening is better
6. Recommendations Summary¶
6.1. Immediate Actions (High Priority)¶
- Remove Orphaned
argo/Directory - Decision: Is it obsolete or experimental?
- Action: Delete or move to feature branch
-
Document the decision
-
Fix Documentation-Reality Mismatch
- Update README.md to match actual structure
- Fix all path references
-
Add validation tests
-
Rewrite Deployment Guide
- Remove obsolete manual instructions
- Document current automated workflow
-
Include troubleshooting section
-
Consolidate ApplicationSet Implementations
- Choose one generator pattern (merge or matrix)
- Remove the other
- Document the choice in ADR
6.2. Short-Term Improvements (Medium Priority)¶
- Simplify Configuration Model
- Evaluate value of base service configs
- Consider consolidating to fewer files
-
Document clear guidelines
-
Add Validation Tooling
- Create
scripts/validate-gitops.sh - Integrate into CI/CD
-
Add pre-commit hooks
-
Improve Documentation
- Add architecture diagrams
- Create troubleshooting guide
-
Document multi-source pattern
-
Fix Plugins Naming
- Rename
plugins/→infrastructure/ - OR document why "plugins" terminology
- Fix
local/vsgit/inconsistency
6.3. Long-Term Enhancements (Low Priority)¶
- Progressive Delivery
- Evaluate Argo Rollouts integration
- Implement canary deployments
-
Add automated rollback
-
Secrets Management Audit
- Document External Secrets Operator setup
- Validate secret rotation
-
Security review
-
Monitoring Integration
- Add deployment metrics
- ArgoCD application health dashboards
- Alerting on sync failures
7. What to Do Next¶
7.1. Decision Points Requiring User Input¶
Question 1: ApplicationSet Strategy - Keep merge generator (argo/) or matrix generator (argocd/)? - Current: Both exist with different implementations - Recommendation: Choose one, document in ADR
Question 2: Directory Structure
- Rename to match README (e.g., infrastructure/) or update README?
- Recommendation: Update README (less disruptive)
Question 3: Configuration Simplification - Keep split base/environment configs or consolidate? - Trade-off: DRY vs simplicity - Recommendation: Evaluate actual reuse, then decide
Question 4: Production Sync Policy - Enforce manual sync or keep current automated approach? - Current: Automated but sync windows defined - Recommendation: Enforce manual for production safety
7.2. Suggested Next Steps¶
Phase 1: Clean Up (1-2 days)
1. Review and delete argo/ directory
2. Update README to match reality
3. Fix plugins naming inconsistency
4. Run git commit with all cleanup
Phase 2: Document (2-3 days) 1. Create architecture diagrams 2. Rewrite deployment guide 3. Add troubleshooting guide 4. Create ADR for key decisions
Phase 3: Validate (3-5 days) 1. Build validation script 2. Integrate into CI/CD 3. Test all ApplicationSets 4. Fix any discovered issues
Phase 4: Harden (Ongoing) 1. Review AppProject permissions 2. Add OPA policies 3. Security audit 4. Monitor and iterate
8. Conclusion¶
The cluster-gitops repository demonstrates a solid understanding of GitOps principles and ArgoCD best practices, with ApplicationSets, multi-source Helm, and automated deployments working well in staging.
Key Strengths: - Automated staging deployments via CI/CD - Clear environment separation (staging/production) - DRY principle with ApplicationSets - Git-based audit trail
Key Concerns: - Documentation drift (README doesn't match reality) - Orphaned directories from refactoring - Complex nested configuration structure - Missing validation tooling
Priority Focus: 1. Documentation accuracy (HIGH) 2. Remove orphaned artifacts (HIGH) 3. Simplify where possible (MEDIUM) 4. Add validation (MEDIUM)
Overall: The repository is functional but needs cleanup and documentation. With the recommended improvements, it will be much easier for new team members to understand and contribute.
Document Status: Draft - Ready for team review Review Requested: Architecture decisions, prioritization, next steps Last Updated: 2025-11-15