Last updated: May 2026
Azure App ServiceIntermediateAZ-104⏱ 11 min read

Scaling Azure App Service

App Service can handle traffic spikes gracefully — scaling from 1 to 30+ instances automatically based on metrics like CPU, memory, or request count. Understanding how scaling works, how to configure auto-scale rules, and the difference between scaling up and scaling out is essential for AZ-104 and for building resilient applications.

What you'll learn Scale-up (vertical) vs scale-out (horizontal) · Manual scaling · Auto-scale — metrics-based and schedule-based · Scale conditions and rules · Min/max/default instance counts · Scale-in cooldown period · Flapping prevention · Per-app scaling · Always On setting

Scale-Up vs Scale-Out

Scale Up (Vertical)Scale Out (Horizontal)
What happensMove to a larger VM (more CPU/RAM)Add more instances of the same VM
Downtime?Brief restart requiredZero downtime
CostPay for larger VMPay per instance
LimitsMax VM size in tierUp to 10 (Standard), 30 (Premium)
Best forMemory-intensive apps, CPU-bound single-threadedStateless apps, handling more concurrent users
💡
Prefer Scale-Out for Web Apps Scale-out is the cloud-native approach — stateless web apps handle more users by adding instances. Scale-up is a blunt instrument. Build stateless apps (sessions in Redis, files in Blob Storage) that scale out cleanly.

Manual Scaling

All tiers (including Basic) support manual instance count changes. Set a fixed number of instances in Scale Out settings:

Azure CLIManually set instance count
# Set to 3 instances
az appservice plan update \
  --name myAppServicePlan \
  --resource-group myRG \
  --number-of-workers 3

Auto-Scale Overview

Auto-scale (requires Standard tier or above) automatically adjusts instance count based on rules you define. It operates at the App Service Plan level — not per app.

Auto-Scale Components

  • Profile — A set of scale conditions (e.g., "weekday business hours")
  • Condition — When to scale (metric threshold or schedule)
  • Rule — What to do when condition is met (scale out/in by how many instances)
  • Min/Max/Default — Instance count limits and fallback

Scale Conditions

TypeTriggers OnExample
Metric-basedCPU %, Memory %, request count, HTTP queue lengthScale out when CPU > 70% for 5 minutes
Schedule-basedDay/time scheduleScale to 5 instances every weekday at 8am; scale to 1 at 6pm

Scale Rules

Each condition contains scale-out and scale-in rules. Best practice: always have a paired scale-in rule for every scale-out rule.

ExampleTypical auto-scale configuration
Profile: Default
  Min instances: 2
  Max instances: 10
  Default instances: 2

Scale-out rule:
  Metric: CPU Percentage
  Time aggregation: Average
  Operator: Greater than
  Threshold: 70%
  Duration: 5 minutes
  Action: Increase count by 2

Scale-in rule:
  Metric: CPU Percentage
  Operator: Less than
  Threshold: 30%
  Duration: 10 minutes
  Action: Decrease count by 1
⚠️
Always Define Scale-In Rules Without scale-in rules, your app will scale out but never back in — instances accumulate and costs grow. Always pair scale-out rules with scale-in rules.

Schedule-Based Scaling

If your traffic pattern is predictable (e.g., business hours only), schedule-based scaling is more reliable than metric-based — it pre-scales before users arrive:

  • 8:00 AM weekdays: scale to 5 instances
  • 6:00 PM weekdays: scale to 1 instance
  • Weekends: maintain 1 instance

Combine schedule-based and metric-based rules — schedule sets the baseline, metrics handle unexpected spikes.

Cooldown Period

After a scale action, the cooldown period prevents another scale action from triggering immediately. This prevents rapid oscillation ("flapping") — scaling out then immediately back in.

  • Default cooldown: 5 minutes
  • Scale-out cooldown should be shorter than scale-in cooldown
  • Scale out: 1–2 minutes (respond quickly to traffic)
  • Scale in: 10–15 minutes (wait to confirm traffic has dropped)

Per-App Scaling

By default, all apps on a shared App Service Plan scale together (the plan scales). Per-App Scaling allows individual apps on the same plan to have different instance counts — useful for plans hosting multiple apps with different traffic patterns.

Always On

"Always On" prevents App Service from idling the app after 20 minutes of inactivity. Without Always On, the first request after idle has a cold start delay (seconds to minutes).

  • Required for WebJobs running continuously
  • Required for apps that must respond instantly (no cold starts)
  • Available on Basic tier and above
  • NOT available on Free tier
💡
AZ-104 Exam Tip Know that auto-scale requires Standard tier or above. Know the difference between metric-based and schedule-based scaling. Know that cooldown prevents flapping. Know Always On prevents cold starts and is required for continuous WebJobs. Know that scale-out is preferred over scale-up for web apps.
📝 Practice Questions
Click an option to check your answer.
Q1. Which App Service Plan tier is the minimum required for auto-scaling?
A — Basic
B — Standard
C — Premium
D — Isolated
Q2. What does the cooldown period in auto-scale prevent?
A — Unauthorized access during scaling operations
B — Flapping — rapid scaling in and out repeatedly (oscillation)
C — Scaling beyond the maximum instance count
D — Performance degradation during scale-out operations
Q3. A web app uses sessions stored in server memory. When scaled out to 5 instances, users sometimes get logged out. What is the cause?
A — Auto-scale rules are configured incorrectly
B — In-memory sessions are instance-specific — users hitting a different instance lose their session
C — SSL certificates are not configured for all instances
D — Deployment slots are interfering with user sessions
Q4. What is the purpose of the "Always On" setting in App Service?
A — Enables auto-scaling to run 24 hours a day
B — Prevents the app from idling after inactivity, avoiding cold start delays
C — Prevents the app from being restarted during deployments
D — Guarantees 99.99% uptime SLA
Q5. Why is schedule-based scaling preferred over metric-based scaling for predictable traffic patterns?
A — Schedule-based scaling is cheaper than metric-based
B — It pre-scales before traffic arrives; metric-based reacts after the spike has already started
C — Schedule-based scaling works on Basic tier; metric-based requires Premium
D — Schedule-based scaling is more accurate than metric-based
Comments
Disclaimer: RedKite Cloud is an independent educational resource and is not affiliated with Microsoft Corporation.