Scale-Up vs Scale-Out
| Scale Up (Vertical) | Scale Out (Horizontal) | |
|---|---|---|
| What happens | Move to a larger VM (more CPU/RAM) | Add more instances of the same VM |
| Downtime? | Brief restart required | Zero downtime |
| Cost | Pay for larger VM | Pay per instance |
| Limits | Max VM size in tier | Up to 10 (Standard), 30 (Premium) |
| Best for | Memory-intensive apps, CPU-bound single-threaded | Stateless apps, handling more concurrent users |
Manual Scaling
All tiers (including Basic) support manual instance count changes. Set a fixed number of instances in Scale Out settings:
# Set to 3 instances
az appservice plan update \
--name myAppServicePlan \
--resource-group myRG \
--number-of-workers 3
Auto-Scale Overview
Auto-scale (requires Standard tier or above) automatically adjusts instance count based on rules you define. It operates at the App Service Plan level — not per app.
Auto-Scale Components
- Profile — A set of scale conditions (e.g., "weekday business hours")
- Condition — When to scale (metric threshold or schedule)
- Rule — What to do when condition is met (scale out/in by how many instances)
- Min/Max/Default — Instance count limits and fallback
Scale Conditions
| Type | Triggers On | Example |
|---|---|---|
| Metric-based | CPU %, Memory %, request count, HTTP queue length | Scale out when CPU > 70% for 5 minutes |
| Schedule-based | Day/time schedule | Scale to 5 instances every weekday at 8am; scale to 1 at 6pm |
Scale Rules
Each condition contains scale-out and scale-in rules. Best practice: always have a paired scale-in rule for every scale-out rule.
Profile: Default
Min instances: 2
Max instances: 10
Default instances: 2
Scale-out rule:
Metric: CPU Percentage
Time aggregation: Average
Operator: Greater than
Threshold: 70%
Duration: 5 minutes
Action: Increase count by 2
Scale-in rule:
Metric: CPU Percentage
Operator: Less than
Threshold: 30%
Duration: 10 minutes
Action: Decrease count by 1
Schedule-Based Scaling
If your traffic pattern is predictable (e.g., business hours only), schedule-based scaling is more reliable than metric-based — it pre-scales before users arrive:
- 8:00 AM weekdays: scale to 5 instances
- 6:00 PM weekdays: scale to 1 instance
- Weekends: maintain 1 instance
Combine schedule-based and metric-based rules — schedule sets the baseline, metrics handle unexpected spikes.
Cooldown Period
After a scale action, the cooldown period prevents another scale action from triggering immediately. This prevents rapid oscillation ("flapping") — scaling out then immediately back in.
- Default cooldown: 5 minutes
- Scale-out cooldown should be shorter than scale-in cooldown
- Scale out: 1–2 minutes (respond quickly to traffic)
- Scale in: 10–15 minutes (wait to confirm traffic has dropped)
Per-App Scaling
By default, all apps on a shared App Service Plan scale together (the plan scales). Per-App Scaling allows individual apps on the same plan to have different instance counts — useful for plans hosting multiple apps with different traffic patterns.
Always On
"Always On" prevents App Service from idling the app after 20 minutes of inactivity. Without Always On, the first request after idle has a cold start delay (seconds to minutes).
- Required for WebJobs running continuously
- Required for apps that must respond instantly (no cold starts)
- Available on Basic tier and above
- NOT available on Free tier