Manual Pod Scaling
# Scale to 5 replicas
kubectl scale deployment myapp --replicas=5
# Verify
kubectl get pods
kubectl get deployment myapp
Horizontal Pod Autoscaler (HPA)
HPA automatically scales the number of pod replicas based on CPU, memory, or custom metrics. When CPU usage goes above the target, HPA adds pods. When it drops, pods are removed.
# Create HPA — scale between 2 and 10 replicas, target 70% CPU
kubectl autoscale deployment myapp \
--min=2 \
--max=10 \
--cpu-percent=70
# Check HPA status
kubectl get hpa
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
resources.requests in your Deployment manifest.Cluster Autoscaler
When pods cannot be scheduled because nodes are full, the Cluster Autoscaler adds new nodes to the pool. When nodes are underutilised, it removes them. Enable it when creating or updating the cluster:
# Enable autoscaler on cluster creation
az aks create \
--resource-group myRG \
--name myAKSCluster \
--node-count 3 \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 10
# Enable on existing cluster node pool
az aks nodepool update \
--resource-group myRG \
--cluster-name myAKSCluster \
--name nodepool1 \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 10
KEDA — Event-Driven Scaling
KEDA (Kubernetes Event-driven Autoscaling) scales pods based on external event sources — Azure Service Bus queue depth, Event Hub lag, HTTP request rate, Prometheus metrics, and more. KEDA can even scale to zero (no pods when no events) and back up.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: myapp-scaler
spec:
scaleTargetRef:
name: myapp
minReplicaCount: 0 # Scale to zero when queue is empty
maxReplicaCount: 20
triggers:
- type: azure-servicebus
metadata:
queueName: orders
namespace: myservicebus
messageCount: "5" # Scale out when more than 5 messages per pod
Manual Node Scaling
az aks scale \
--resource-group myRG \
--name myAKSCluster \
--node-count 5 \
--nodepool-name nodepool1
How Pod and Node Scaling Work Together
| Scenario | What Happens |
|---|---|
| Traffic spikes → CPU rises above HPA threshold | HPA adds pods → if nodes full, Cluster Autoscaler adds nodes |
| Traffic drops → CPU falls below HPA threshold | HPA removes pods → if nodes underutilised, Cluster Autoscaler removes nodes |
| Queue fills up (KEDA) | KEDA scales pods → if nodes full, Cluster Autoscaler adds nodes |
| Queue empties (KEDA) | KEDA scales to zero pods → Cluster Autoscaler removes empty nodes |