DevOps & Containers
Kubernetes in Production: Helm Charts, Operators, and Monitoring Strategies
16 min
KubernetesK8sHelmOperatorsMonitoringProduction
Deploy production-grade Kubernetes workloads with Helm for package management, custom operators for automation, and comprehensive monitoring with Prometheus and Grafana.
Kubernetes orchestrates containers at scale, but production deployments require more than kubectl apply. This guide covers Helm charts for reproducible deployments, operators for custom automation, and monitoring strategies that prevent 3 AM pages.
Helm: The Kubernetes Package Manager
1# Chart.yaml - Helm chart metadata
2apiVersion: v2
3name: my-app
4description: Production-ready web application
5version: 1.0.0
6appVersion: "2.1.0"
7
8# values.yaml - Configuration values
9replicaCount: 3
10
11image:
12 repository: myregistry.azurecr.io/my-app
13 tag: "2.1.0"
14 pullPolicy: IfNotPresent
15
16service:
17 type: ClusterIP
18 port: 80
19
20ingress:
21 enabled: true
22 className: nginx
23 annotations:
24 cert-manager.io/cluster-issuer: "letsencrypt-prod"
25 hosts:
26 - host: app.example.com
27 paths:
28 - path: /
29 pathType: Prefix
30 tls:
31 - secretName: app-tls
32 hosts:
33 - app.example.com
34
35resources:
36 limits:
37 cpu: 500m
38 memory: 512Mi
39 requests:
40 cpu: 250m
41 memory: 256Mi
42
43autoscaling:
44 enabled: true
45 minReplicas: 3
46 maxReplicas: 10
47 targetCPUUtilizationPercentage: 70
48
49# Install:
50# helm install my-app ./my-app-chart
51# helm upgrade my-app ./my-app-chart --values values-production.yaml
52# helm rollback my-app 1 # Instant rollback to previous version
Deployment Strategies
1# Rolling Update (default - zero downtime)
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: app
6spec:
7 replicas: 5
8 strategy:
9 type: RollingUpdate
10 rollingUpdate:
11 maxSurge: 1 # Extra pod during update
12 maxUnavailable: 0 # No downtime
13 template:
14 spec:
15 containers:
16 - name: app
17 image: app:v2
18 readinessProbe:
19 httpGet:
20 path: /health
21 port: 8080
22 initialDelaySeconds: 5
23 periodSeconds: 3
24 livenessProbe:
25 httpGet:
26 path: /health
27 port: 8080
28 initialDelaySeconds: 15
29 periodSeconds: 10
30
31---
32# Blue-Green Deployment (instant rollback)
33apiVersion: v1
34kind: Service
35metadata:
36 name: app-service
37spec:
38 selector:
39 app: app
40 version: green # Switch between blue/green
41 ports:
42 - protocol: TCP
43 port: 80
44 targetPort: 8080
45
46---
47# Canary Deployment (gradual rollout)
48apiVersion: networking.istio.io/v1alpha3
49kind: VirtualService
50metadata:
51 name: app
52spec:
53 hosts:
54 - app.example.com
55 http:
56 - match:
57 - headers:
58 user-agent:
59 regex: ".*Mobile.*"
60 route:
61 - destination:
62 host: app
63 subset: v2
64 - route:
65 - destination:
66 host: app
67 subset: v1
68 weight: 90
69 - destination:
70 host: app
71 subset: v2
72 weight: 10 # 10% traffic to new version
Resource Management and QoS
1apiVersion: v1
2kind: Pod
3metadata:
4 name: guaranteed-pod
5spec:
6 containers:
7 - name: app
8 resources:
9 # Guaranteed QoS: requests == limits
10 requests:
11 memory: "64Mi"
12 cpu: "250m"
13 limits:
14 memory: "64Mi"
15 cpu: "250m"
16
17---
18apiVersion: v1
19kind: ResourceQuota
20metadata:
21 name: compute-quota
22 namespace: production
23spec:
24 hard:
25 requests.cpu: "100"
26 requests.memory: 200Gi
27 limits.cpu: "200"
28 limits.memory: 400Gi
29 pods: "100"
30
31---
32apiVersion: v1
33kind: LimitRange
34metadata:
35 name: default-limits
36 namespace: production
37spec:
38 limits:
39 - default:
40 cpu: "500m"
41 memory: "512Mi"
42 defaultRequest:
43 cpu: "250m"
44 memory: "256Mi"
45 type: Container
Monitoring with Prometheus
1# ServiceMonitor for automatic scraping
2apiVersion: monitoring.coreos.com/v1
3kind: ServiceMonitor
4metadata:
5 name: app-metrics
6spec:
7 selector:
8 matchLabels:
9 app: my-app
10 endpoints:
11 - port: metrics
12 interval: 30s
13 path: /metrics
14
15---
16# PrometheusRule for alerts
17apiVersion: monitoring.coreos.com/v1
18kind: PrometheusRule
19metadata:
20 name: app-alerts
21spec:
22 groups:
23 - name: app
24 interval: 30s
25 rules:
26 - alert: HighErrorRate
27 expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
28 for: 5m
29 labels:
30 severity: critical
31 annotations:
32 summary: "High error rate detected"
33 description: "Error rate is {{ $value | humanizePercentage }}"
34
35 - alert: PodCrashLooping
36 expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
37 labels:
38 severity: warning
39 annotations:
40 summary: "Pod {{ $labels.pod }} is crash looping"
41
42 - alert: HighMemoryUsage
43 expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
44 for: 5m
45 labels:
46 severity: warning
Secrets Management
1# 1. Create secret from literal values
2kubectl create secret generic app-secrets \
3 --from-literal=database-password=mysecretpass \
4 --from-literal=api-key=abc123
5
6# 2. Create secret from file
7kubectl create secret generic app-config \
8 --from-file=config.json
9
10# 3. Use in pod
11apiVersion: v1
12kind: Pod
13metadata:
14 name: app
15spec:
16 containers:
17 - name: app
18 envFrom:
19 - secretRef:
20 name: app-secrets
21 volumeMounts:
22 - name: config
23 mountPath: /etc/config
24 readOnly: true
25 volumes:
26 - name: config
27 secret:
28 secretName: app-config
29
30# 4. External secrets with Sealed Secrets
31# Install controller:
32kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml
33
34# Seal secret (safe to commit to Git):
35kubeseal --format=yaml < secret.yaml > sealed-secret.yaml
36
37# 5. Use AWS Secrets Manager / Azure Key Vault
38# Install external-secrets operator
39helm install external-secrets external-secrets/external-secrets -n external-secrets-system
40
41# SecretStore
42apiVersion: external-secrets.io/v1beta1
43kind: SecretStore
44metadata:
45 name: aws-secrets
46spec:
47 provider:
48 aws:
49 service: SecretsManager
50 region: us-east-1
Production Checklist
- Always set resource requests and limits (prevents node exhaustion)
- Use readiness probes (prevents routing to unhealthy pods)
- Use liveness probes (restarts deadlocked pods)
- Set PodDisruptionBudgets (maintains availability during updates)
- Configure HorizontalPodAutoscaler (auto-scales based on metrics)
- Use multiple replicas across zones (tolerates zone failures)
- Implement pod anti-affinity (spreads replicas across nodes)
- Enable network policies (zero-trust security model)
- Use RBAC for access control (principle of least privilege)
- Monitor cluster health (node conditions, disk pressure, PID pressure)
- Set up log aggregation (ELK/EFK stack or cloud-native)
- Regular backups with Velero (disaster recovery)
- Image scanning in CI/CD (catch vulnerabilities early)
- Use OPA Gatekeeper for policy enforcement
- Implement circuit breakers and retries with service mesh