Last updated: May 2026

Azure App ServiceIntermediateAZ-104⏱ 11 min read

Scaling Azure App Service

App Service can handle traffic spikes gracefully — scaling from 1 to 30+ instances automatically based on metrics like CPU, memory, or request count. Understanding how scaling works, how to configure auto-scale rules, and the difference between scaling up and scaling out is essential for AZ-104 and for building resilient applications.

What you'll learn Scale-up (vertical) vs scale-out (horizontal) · Manual scaling · Auto-scale — metrics-based and schedule-based · Scale conditions and rules · Min/max/default instance counts · Scale-in cooldown period · Flapping prevention · Per-app scaling · Always On setting

Table of Contents

Scale-Up vs Scale-Out
Manual Scaling
Auto-Scale Overview
Scale Conditions
Scale Rules
Schedule-Based Scaling
Cooldown Period
Per-App Scaling
Always On
Practice Questions

Scale-Up vs Scale-Out

	Scale Up (Vertical)	Scale Out (Horizontal)
What happens	Move to a larger VM (more CPU/RAM)	Add more instances of the same VM
Downtime?	Brief restart required	Zero downtime
Cost	Pay for larger VM	Pay per instance
Limits	Max VM size in tier	Up to 10 (Standard), 30 (Premium)
Best for	Memory-intensive apps, CPU-bound single-threaded	Stateless apps, handling more concurrent users

💡

Prefer Scale-Out for Web Apps Scale-out is the cloud-native approach — stateless web apps handle more users by adding instances. Scale-up is a blunt instrument. Build stateless apps (sessions in Redis, files in Blob Storage) that scale out cleanly.

Manual Scaling

All tiers (including Basic) support manual instance count changes. Set a fixed number of instances in Scale Out settings:

Azure CLIManually set instance count

# Set to 3 instances
az appservice plan update \
  --name myAppServicePlan \
  --resource-group myRG \
  --number-of-workers 3

Auto-Scale Overview

Auto-scale (requires Standard tier or above) automatically adjusts instance count based on rules you define. It operates at the App Service Plan level — not per app.

Auto-Scale Components

Profile — A set of scale conditions (e.g., "weekday business hours")
Condition — When to scale (metric threshold or schedule)
Rule — What to do when condition is met (scale out/in by how many instances)
Min/Max/Default — Instance count limits and fallback

Scale Conditions

Type	Triggers On	Example
Metric-based	CPU %, Memory %, request count, HTTP queue length	Scale out when CPU > 70% for 5 minutes
Schedule-based	Day/time schedule	Scale to 5 instances every weekday at 8am; scale to 1 at 6pm

Scale Rules

Each condition contains scale-out and scale-in rules. Best practice: always have a paired scale-in rule for every scale-out rule.

ExampleTypical auto-scale configuration

Profile: Default
  Min instances: 2
  Max instances: 10
  Default instances: 2

Scale-out rule:
  Metric: CPU Percentage
  Time aggregation: Average
  Operator: Greater than
  Threshold: 70%
  Duration: 5 minutes
  Action: Increase count by 2

Scale-in rule:
  Metric: CPU Percentage
  Operator: Less than
  Threshold: 30%
  Duration: 10 minutes
  Action: Decrease count by 1

⚠️

Always Define Scale-In Rules Without scale-in rules, your app will scale out but never back in — instances accumulate and costs grow. Always pair scale-out rules with scale-in rules.

Schedule-Based Scaling

If your traffic pattern is predictable (e.g., business hours only), schedule-based scaling is more reliable than metric-based — it pre-scales before users arrive:

8:00 AM weekdays: scale to 5 instances
6:00 PM weekdays: scale to 1 instance
Weekends: maintain 1 instance

Combine schedule-based and metric-based rules — schedule sets the baseline, metrics handle unexpected spikes.

Cooldown Period

After a scale action, the cooldown period prevents another scale action from triggering immediately. This prevents rapid oscillation ("flapping") — scaling out then immediately back in.

Default cooldown: 5 minutes
Scale-out cooldown should be shorter than scale-in cooldown
Scale out: 1–2 minutes (respond quickly to traffic)
Scale in: 10–15 minutes (wait to confirm traffic has dropped)

Per-App Scaling

By default, all apps on a shared App Service Plan scale together (the plan scales). Per-App Scaling allows individual apps on the same plan to have different instance counts — useful for plans hosting multiple apps with different traffic patterns.

Always On

"Always On" prevents App Service from idling the app after 20 minutes of inactivity. Without Always On, the first request after idle has a cold start delay (seconds to minutes).

Required for WebJobs running continuously
Required for apps that must respond instantly (no cold starts)
Available on Basic tier and above
NOT available on Free tier

💡

AZ-104 Exam Tip Know that auto-scale requires Standard tier or above. Know the difference between metric-based and schedule-based scaling. Know that cooldown prevents flapping. Know Always On prevents cold starts and is required for continuous WebJobs. Know that scale-out is preferred over scale-up for web apps.

📝 Practice Questions

Click an option to check your answer.

Q1. Which App Service Plan tier is the minimum required for auto-scaling?

A — Basic

B — Standard

C — Premium

D — Isolated

Q2. What does the cooldown period in auto-scale prevent?

A — Unauthorized access during scaling operations

B — Flapping — rapid scaling in and out repeatedly (oscillation)

C — Scaling beyond the maximum instance count

D — Performance degradation during scale-out operations

Q3. A web app uses sessions stored in server memory. When scaled out to 5 instances, users sometimes get logged out. What is the cause?

A — Auto-scale rules are configured incorrectly

B — In-memory sessions are instance-specific — users hitting a different instance lose their session

C — SSL certificates are not configured for all instances

D — Deployment slots are interfering with user sessions

Q4. What is the purpose of the "Always On" setting in App Service?

A — Enables auto-scaling to run 24 hours a day

B — Prevents the app from idling after inactivity, avoiding cold start delays

C — Prevents the app from being restarted during deployments

D — Guarantees 99.99% uptime SLA

Q5. Why is schedule-based scaling preferred over metric-based scaling for predictable traffic patterns?

A — Schedule-based scaling is cheaper than metric-based

B — It pre-scales before traffic arrives; metric-based reacts after the spike has already started

C — Schedule-based scaling works on Basic tier; metric-based requires Premium

D — Schedule-based scaling is more accurate than metric-based

← PreviousCustom Domains & SSL Next →App Service Networking

Comments

Disclaimer: RedKite Cloud is an independent educational resource and is not affiliated with Microsoft Corporation.