All posts
DevOps & Containers

Kubernetes in Production: Helm Charts, Operators, and Monitoring Strategies

16 min
Share:
KubernetesK8sHelmOperatorsMonitoringProduction

Deploy production-grade Kubernetes workloads with Helm for package management, custom operators for automation, and comprehensive monitoring with Prometheus and Grafana.

Kubernetes orchestrates containers at scale, but production deployments require more than kubectl apply. This guide covers Helm charts for reproducible deployments, operators for custom automation, and monitoring strategies that prevent 3 AM pages.

Helm: The Kubernetes Package Manager

1# Chart.yaml - Helm chart metadata
2apiVersion: v2
3name: my-app
4description: Production-ready web application
5version: 1.0.0
6appVersion: "2.1.0"
7
8# values.yaml - Configuration values
9replicaCount: 3
10
11image:
12  repository: myregistry.azurecr.io/my-app
13  tag: "2.1.0"
14  pullPolicy: IfNotPresent
15
16service:
17  type: ClusterIP
18  port: 80
19
20ingress:
21  enabled: true
22  className: nginx
23  annotations:
24    cert-manager.io/cluster-issuer: "letsencrypt-prod"
25  hosts:
26    - host: app.example.com
27      paths:
28        - path: /
29          pathType: Prefix
30  tls:
31    - secretName: app-tls
32      hosts:
33        - app.example.com
34
35resources:
36  limits:
37    cpu: 500m
38    memory: 512Mi
39  requests:
40    cpu: 250m
41    memory: 256Mi
42
43autoscaling:
44  enabled: true
45  minReplicas: 3
46  maxReplicas: 10
47  targetCPUUtilizationPercentage: 70
48
49# Install:
50# helm install my-app ./my-app-chart
51# helm upgrade my-app ./my-app-chart --values values-production.yaml
52# helm rollback my-app 1  # Instant rollback to previous version

Deployment Strategies

1# Rolling Update (default - zero downtime)
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5  name: app
6spec:
7  replicas: 5
8  strategy:
9    type: RollingUpdate
10    rollingUpdate:
11      maxSurge: 1        # Extra pod during update
12      maxUnavailable: 0  # No downtime
13  template:
14    spec:
15      containers:
16      - name: app
17        image: app:v2
18        readinessProbe:
19          httpGet:
20            path: /health
21            port: 8080
22          initialDelaySeconds: 5
23          periodSeconds: 3
24        livenessProbe:
25          httpGet:
26            path: /health
27            port: 8080
28          initialDelaySeconds: 15
29          periodSeconds: 10
30
31---
32# Blue-Green Deployment (instant rollback)
33apiVersion: v1
34kind: Service
35metadata:
36  name: app-service
37spec:
38  selector:
39    app: app
40    version: green  # Switch between blue/green
41  ports:
42  - protocol: TCP
43    port: 80
44    targetPort: 8080
45
46---
47# Canary Deployment (gradual rollout)
48apiVersion: networking.istio.io/v1alpha3
49kind: VirtualService
50metadata:
51  name: app
52spec:
53  hosts:
54  - app.example.com
55  http:
56  - match:
57    - headers:
58        user-agent:
59          regex: ".*Mobile.*"
60    route:
61    - destination:
62        host: app
63        subset: v2
64  - route:
65    - destination:
66        host: app
67        subset: v1
68      weight: 90
69    - destination:
70        host: app
71        subset: v2
72      weight: 10  # 10% traffic to new version

Resource Management and QoS

1apiVersion: v1
2kind: Pod
3metadata:
4  name: guaranteed-pod
5spec:
6  containers:
7  - name: app
8    resources:
9      # Guaranteed QoS: requests == limits
10      requests:
11        memory: "64Mi"
12        cpu: "250m"
13      limits:
14        memory: "64Mi"
15        cpu: "250m"
16
17---
18apiVersion: v1
19kind: ResourceQuota
20metadata:
21  name: compute-quota
22  namespace: production
23spec:
24  hard:
25    requests.cpu: "100"
26    requests.memory: 200Gi
27    limits.cpu: "200"
28    limits.memory: 400Gi
29    pods: "100"
30
31---
32apiVersion: v1
33kind: LimitRange
34metadata:
35  name: default-limits
36  namespace: production
37spec:
38  limits:
39  - default:
40      cpu: "500m"
41      memory: "512Mi"
42    defaultRequest:
43      cpu: "250m"
44      memory: "256Mi"
45    type: Container

Monitoring with Prometheus

1# ServiceMonitor for automatic scraping
2apiVersion: monitoring.coreos.com/v1
3kind: ServiceMonitor
4metadata:
5  name: app-metrics
6spec:
7  selector:
8    matchLabels:
9      app: my-app
10  endpoints:
11  - port: metrics
12    interval: 30s
13    path: /metrics
14
15---
16# PrometheusRule for alerts
17apiVersion: monitoring.coreos.com/v1
18kind: PrometheusRule
19metadata:
20  name: app-alerts
21spec:
22  groups:
23  - name: app
24    interval: 30s
25    rules:
26    - alert: HighErrorRate
27      expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
28      for: 5m
29      labels:
30        severity: critical
31      annotations:
32        summary: "High error rate detected"
33        description: "Error rate is {{ $value | humanizePercentage }}"
34    
35    - alert: PodCrashLooping
36      expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
37      labels:
38        severity: warning
39      annotations:
40        summary: "Pod {{ $labels.pod }} is crash looping"
41    
42    - alert: HighMemoryUsage
43      expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
44      for: 5m
45      labels:
46        severity: warning

Secrets Management

1# 1. Create secret from literal values
2kubectl create secret generic app-secrets \
3  --from-literal=database-password=mysecretpass \
4  --from-literal=api-key=abc123
5
6# 2. Create secret from file
7kubectl create secret generic app-config \
8  --from-file=config.json
9
10# 3. Use in pod
11apiVersion: v1
12kind: Pod
13metadata:
14  name: app
15spec:
16  containers:
17  - name: app
18    envFrom:
19    - secretRef:
20        name: app-secrets
21    volumeMounts:
22    - name: config
23      mountPath: /etc/config
24      readOnly: true
25  volumes:
26  - name: config
27    secret:
28      secretName: app-config
29
30# 4. External secrets with Sealed Secrets
31# Install controller:
32kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml
33
34# Seal secret (safe to commit to Git):
35kubeseal --format=yaml < secret.yaml > sealed-secret.yaml
36
37# 5. Use AWS Secrets Manager / Azure Key Vault
38# Install external-secrets operator
39helm install external-secrets external-secrets/external-secrets -n external-secrets-system
40
41# SecretStore
42apiVersion: external-secrets.io/v1beta1
43kind: SecretStore
44metadata:
45  name: aws-secrets
46spec:
47  provider:
48    aws:
49      service: SecretsManager
50      region: us-east-1

Production Checklist

  • Always set resource requests and limits (prevents node exhaustion)
  • Use readiness probes (prevents routing to unhealthy pods)
  • Use liveness probes (restarts deadlocked pods)
  • Set PodDisruptionBudgets (maintains availability during updates)
  • Configure HorizontalPodAutoscaler (auto-scales based on metrics)
  • Use multiple replicas across zones (tolerates zone failures)
  • Implement pod anti-affinity (spreads replicas across nodes)
  • Enable network policies (zero-trust security model)
  • Use RBAC for access control (principle of least privilege)
  • Monitor cluster health (node conditions, disk pressure, PID pressure)
  • Set up log aggregation (ELK/EFK stack or cloud-native)
  • Regular backups with Velero (disaster recovery)
  • Image scanning in CI/CD (catch vulnerabilities early)
  • Use OPA Gatekeeper for policy enforcement
  • Implement circuit breakers and retries with service mesh