What Are VM Scale Sets?
A VM Scale Set is a group of identical VMs managed together as a single unit. You define the VM configuration once (image, size, OS, software), and the Scale Set creates and manages multiple copies. The number of copies scales automatically based on rules you define.
What Scale Sets Give You
- Elasticity — Scale from 1 to 1,000 VMs automatically
- High Availability — Automatically spreads VMs across fault domains and Availability Zones
- Consistent Configuration — All VMs are identical — same image, same software, same config
- Integrated Load Balancing — Works natively with Azure Load Balancer and Application Gateway
- Rolling Updates — Update VMs one at a time without downtime
Orchestration Modes
Azure VMSS offers two orchestration modes:
| Uniform Mode | Flexible Mode | |
|---|---|---|
| VM identity | All VMs are identical instances | Each VM is a full standalone VM |
| Management | Managed as a group | Can manage individual VMs independently |
| Max instances | 1,000 | 1,000 |
| Mixed VM sizes | No — all same size | Yes — different sizes allowed |
| Best for | Stateless web tiers, microservices | Stateful workloads, mixed workloads |
Scaling Policies
There are three ways to configure when and how your Scale Set scales:
1. Metric-Based Scaling (Most Common)
Scale based on real-time metrics — CPU usage, memory, queue length, custom metrics.
| Rule Example | Action |
|---|---|
| CPU > 75% for 5 minutes | Add 2 VMs (scale out) |
| CPU < 25% for 10 minutes | Remove 1 VM (scale in) |
| Queue length > 100 messages | Add 3 VMs |
| Memory > 80% | Add 1 VM |
2. Schedule-Based Scaling
Scale based on a fixed schedule — useful for predictable patterns.
- Scale to 20 VMs every weekday at 8 AM (business hours peak)
- Scale down to 2 VMs every weekday at 8 PM (after hours)
- Scale to 50 VMs on the first day of each month (batch processing day)
3. Predictive Scaling
Uses machine learning to predict future load based on historical patterns and pre-scales before the demand arrives — rather than reacting after CPU already spikes. Available on Flexible mode scale sets.
Scale-Out vs Scale-In
| Action | Meaning | When |
|---|---|---|
| Scale Out | Add more VMs (horizontal scaling) | Load increases |
| Scale In | Remove VMs (horizontal scaling) | Load decreases |
| Scale Up | Increase VM size (vertical scaling) | Need more power per VM |
| Scale Down | Decrease VM size (vertical scaling) | Over-provisioned |
Scale-In Policy
When scaling in, Azure needs to decide which VMs to remove. The default policy removes the VM with the highest instance ID first. You can configure:
- Default — Remove highest instance ID first
- OldestVM — Remove the oldest VM first (good for keeping fresh instances)
- NewestVM — Remove the newest VM first
Scaling Limits
| Setting | Description | Example |
|---|---|---|
| Minimum instances | Always keep at least this many VMs running | 2 (never scale below 2) |
| Maximum instances | Never exceed this many VMs | 50 (cap at 50 to control costs) |
| Default instances | Start with this many if no metric data | 3 |
| Cooldown period | Wait this long between scaling actions | 5 minutes (avoid rapid flapping) |
Zone-Spanning Scale Sets
VM Scale Sets can span multiple Availability Zones — giving you both elasticity and zone-level high availability in one service. Azure automatically distributes VM instances across zones as it scales out.
az vmss create \
--resource-group myResourceGroup \
--name myScaleSet \
--image Ubuntu2204 \
--vm-sku Standard_B2s \
--instance-count 3 \
--zones 1 2 3 \
--admin-username azureuser \
--generate-ssh-keys \
--orchestration-mode Flexible \
--load-balancer myLoadBalancer
Creating a VM Scale Set
# Create the Scale Set
az vmss create \
--resource-group myResourceGroup \
--name myScaleSet \
--image Ubuntu2204 \
--vm-sku Standard_B2s \
--instance-count 2 \
--admin-username azureuser \
--generate-ssh-keys
# Add auto-scaling: scale out when CPU > 70%
az monitor autoscale create \
--resource-group myResourceGroup \
--resource myScaleSet \
--resource-type Microsoft.Compute/virtualMachineScaleSets \
--name autoscale-vmss \
--min-count 2 \
--max-count 10 \
--count 2
# Add scale-out rule
az monitor autoscale rule create \
--resource-group myResourceGroup \
--autoscale-name autoscale-vmss \
--condition "Percentage CPU > 70 avg 5m" \
--scale out 2
# Add scale-in rule
az monitor autoscale rule create \
--resource-group myResourceGroup \
--autoscale-name autoscale-vmss \
--condition "Percentage CPU < 25 avg 10m" \
--scale in 1
Real-World Scaling Scenarios
E-Commerce Sale Event
An online store normally runs 5 VMs but expects 20x traffic during a sale. Configure:
- Schedule: Scale to 30 VMs 30 minutes before the sale starts
- Metric: If CPU exceeds 80%, add 5 more VMs (up to max 100)
- Schedule: Scale back to 5 VMs 2 hours after the sale ends
Business Hours Application
An internal HR application used only during office hours:
- Schedule: Scale to 10 VMs at 8 AM weekdays
- Schedule: Scale to 2 VMs at 7 PM weekdays
- Schedule: Scale to 2 VMs all day Saturday and Sunday