Last updated: May 2026
Azure ContainersIntermediateAZ-104⏱ 11 min read

AKS Scaling

AKS supports scaling at two levels — pod scaling (adding more replicas of your application) and node scaling (adding more VMs to the cluster). These work together: when pods need more resources than available nodes can provide, the Cluster Autoscaler adds new nodes. When traffic drops, it removes them. Understanding both layers is key to building cost-efficient, resilient AKS workloads.

What you'll learn Manual pod scaling · Horizontal Pod Autoscaler (HPA) — scale pods on CPU/memory · Cluster Autoscaler — scale nodes automatically · KEDA — scale on external metrics (queues, events) · Manual node scaling · Scale to zero with KEDA

Manual Pod Scaling

kubectlManually scale a deployment
# Scale to 5 replicas
kubectl scale deployment myapp --replicas=5

# Verify
kubectl get pods
kubectl get deployment myapp

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pod replicas based on CPU, memory, or custom metrics. When CPU usage goes above the target, HPA adds pods. When it drops, pods are removed.

kubectlCreate HPA — scale on CPU
# Create HPA — scale between 2 and 10 replicas, target 70% CPU
kubectl autoscale deployment myapp \
  --min=2 \
  --max=10 \
  --cpu-percent=70

# Check HPA status
kubectl get hpa
YAMLHPA manifest
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
⚠️
Resource Requests Required HPA requires CPU and memory requests to be set on the pod spec — otherwise it cannot calculate utilisation percentages. Always set resources.requests in your Deployment manifest.

Cluster Autoscaler

When pods cannot be scheduled because nodes are full, the Cluster Autoscaler adds new nodes to the pool. When nodes are underutilised, it removes them. Enable it when creating or updating the cluster:

Azure CLIEnable Cluster Autoscaler
# Enable autoscaler on cluster creation
az aks create \
  --resource-group myRG \
  --name myAKSCluster \
  --node-count 3 \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 10

# Enable on existing cluster node pool
az aks nodepool update \
  --resource-group myRG \
  --cluster-name myAKSCluster \
  --name nodepool1 \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 10

KEDA — Event-Driven Scaling

KEDA (Kubernetes Event-driven Autoscaling) scales pods based on external event sources — Azure Service Bus queue depth, Event Hub lag, HTTP request rate, Prometheus metrics, and more. KEDA can even scale to zero (no pods when no events) and back up.

YAMLKEDA ScaledObject — scale on Service Bus queue
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: myapp-scaler
spec:
  scaleTargetRef:
    name: myapp
  minReplicaCount: 0          # Scale to zero when queue is empty
  maxReplicaCount: 20
  triggers:
  - type: azure-servicebus
    metadata:
      queueName: orders
      namespace: myservicebus
      messageCount: "5"       # Scale out when more than 5 messages per pod

Manual Node Scaling

Azure CLIManually scale node pool
az aks scale \
  --resource-group myRG \
  --name myAKSCluster \
  --node-count 5 \
  --nodepool-name nodepool1

How Pod and Node Scaling Work Together

ScenarioWhat Happens
Traffic spikes → CPU rises above HPA thresholdHPA adds pods → if nodes full, Cluster Autoscaler adds nodes
Traffic drops → CPU falls below HPA thresholdHPA removes pods → if nodes underutilised, Cluster Autoscaler removes nodes
Queue fills up (KEDA)KEDA scales pods → if nodes full, Cluster Autoscaler adds nodes
Queue empties (KEDA)KEDA scales to zero pods → Cluster Autoscaler removes empty nodes
💡
AZ-104 Exam Tip Know that HPA scales pods based on CPU/memory metrics. Know that Cluster Autoscaler scales nodes based on pending pods. Know that KEDA scales on external event sources and can scale to zero. Know that HPA requires resource requests to be set on pods.
📝 Practice Questions
Click an option to check your answer.
Q1. What does the Horizontal Pod Autoscaler (HPA) scale?
A — The number of pod replicas based on CPU/memory metrics
B — The number of nodes in the cluster
C — The CPU and memory limits of individual containers
D — The number of node pools in the cluster
Q2. When does the Cluster Autoscaler add new nodes?
A — When cluster CPU usage exceeds 70%
B — When pods cannot be scheduled due to insufficient node capacity
C — On a fixed schedule every hour
D — When network traffic exceeds a threshold
Q3. What is unique about KEDA compared to HPA?
A — KEDA can only scale based on CPU and memory like HPA
B — KEDA scales based on external event sources and can scale to zero replicas
C — KEDA scales nodes instead of pods
D — KEDA is built into all Kubernetes distributions by default
Q4. Why must resource requests be set on pods for HPA to work?
A — Without requests, pods cannot be scheduled on nodes
B — HPA calculates utilisation as actual usage divided by requested — without requests there is no baseline percentage
C — Resource requests are required for security compliance
D — Without requests, HPA cannot set resource limits on new pods
Q5. An application processes messages from a Service Bus queue. Messages accumulate at night and the queue is empty during the day. Which scaling tool is best?
A — HPA scaling on CPU utilisation
B — Cluster Autoscaler scaling on node utilisation
C — KEDA with a Service Bus trigger
D — Manual scaling via kubectl
Comments
Disclaimer: RedKite Cloud is an independent educational resource and is not affiliated with Microsoft Corporation.