Last updated: May 2026
Azure DatabasesIntermediateAZ-104⏱ 11 min read

Azure Synapse Analytics

Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It's the evolution of Azure SQL Data Warehouse — now a unified workspace for data engineers, data scientists, and business analysts to work with petabyte-scale data using both SQL and Spark engines.

What you'll learn Synapse architecture overview · Dedicated SQL Pool (data warehouse) · Serverless SQL Pool (query-on-demand) · Apache Spark Pool · Synapse Link (HTAP) · Synapse Pipelines (data integration) · When to use Synapse vs SQL Database · Pricing model

Synapse Architecture

Azure Synapse is a unified workspace with multiple compute engines, all connected to a common data lake (Azure Data Lake Storage Gen2):

ComponentPurpose
Dedicated SQL PoolTraditional data warehouse — provisioned MPP compute
Serverless SQL PoolQuery files in data lake on-demand with SQL
Apache Spark PoolBig data processing, ML, data transformation with Spark
Synapse PipelinesData integration and orchestration (like Azure Data Factory)
Synapse LinkHTAP — query operational data in Cosmos DB/SQL without ETL
Synapse StudioUnified web IDE for all Synapse capabilities

Dedicated SQL Pool

The traditional data warehouse component — formerly called Azure SQL Data Warehouse. A massively parallel processing (MPP) SQL engine optimised for petabyte-scale analytics.

Key Concepts

  • Data Warehouse Units (DWU) — The billing unit. More DWUs = more compute and memory. Scale up/down as needed.
  • Distributions — Data is distributed across 60 nodes. Distribution key choice is critical for performance.
  • Columnstore indexes — Data stored column-by-column — excellent for analytical queries scanning many rows
  • Pause/Resume — Pause when not needed, stop compute billing. Storage billing continues.
💡
Pause to Save Money Dedicated SQL Pools are expensive (charged per DWU per hour). Always pause them outside business hours or when not running queries. Storage continues to bill but compute (the expensive part) stops.

Serverless SQL Pool

Serverless SQL Pool allows you to query files directly in Azure Data Lake Storage (Parquet, CSV, JSON, Delta) using standard SQL — without provisioning any infrastructure. You pay per TB of data scanned.

  • No infrastructure to manage
  • Query data in place — no need to load into a database first
  • Create external tables as views over data lake files
  • Pay per TB scanned — ₹5/TB (approximate)
SQLQuery Parquet file in data lake with Serverless SQL
SELECT TOP 100 *
FROM OPENROWSET(
  BULK 'https://mystorageaccount.dfs.core.windows.net/data/sales/*.parquet',
  FORMAT = 'PARQUET'
) AS sales
WHERE sales.year = 2026;

Apache Spark Pool

Serverless Spark clusters for big data processing, machine learning, and data transformation using Python, Scala, R, or SQL. Auto-scales — only active (and billed) when running jobs.

  • Integrated with Azure ML for model training
  • Share data with Dedicated SQL Pool (via shared data lake)
  • Delta Lake support for ACID transactions on data lake files

Synapse Link enables Hybrid Transactional Analytical Processing (HTAP) — run analytics directly on operational data without copying it or impacting the transactional database:

  • Synapse Link for Cosmos DB — Analytical store mirrors your Cosmos DB data; query it with SQL or Spark
  • Synapse Link for SQL — Replicate SQL Database/MI data to Synapse for analytics

Synapse Pipelines

Built-in data integration capability (same visual experience as Azure Data Factory). Orchestrate data movement and transformation — copy data from 90+ sources, transform with data flows or Spark, load into the data warehouse.

Synapse vs SQL Database

FactorSynapse Dedicated SQL PoolAzure SQL Database
Workload typeAnalytics — complex queries on billions of rowsTransactional (OLTP) — many small reads/writes
Optimised forAggregations, scans, reportingPoint lookups, inserts, updates
Storage formatColumnarRow-based
ScalingDWU-based MPPvCores/DTUs
Max data sizePetabytes4 TB
💡
AZ-104 Exam Tip Know Synapse = data warehousing and analytics (not OLTP). Know the four main components — Dedicated SQL Pool, Serverless SQL Pool, Spark Pool, Synapse Pipelines. Know that Dedicated SQL Pool can be paused to stop compute billing. Know Synapse Link enables analytics on operational data without ETL.
📝 Practice Questions
Click an option to check your answer.
Q1. Which Synapse component allows you to query files directly in Azure Data Lake using SQL without provisioning any infrastructure?
A — Dedicated SQL Pool
B — Serverless SQL Pool
C — Apache Spark Pool
D — Synapse Pipelines
Q2. What is the billing unit for Dedicated SQL Pool (data warehouse)?
A — vCores
B — Data Warehouse Units (DWU)
C — Request Units (RU/s)
D — DTUs
Q3. A company runs analytics reports on their Cosmos DB operational data. They want to avoid impacting the production database performance. What should they use?
A — Run analytics directly on the primary Cosmos DB
B — Enable Synapse Link for Cosmos DB — analytical queries run against the analytical store
C — Export data to CSV daily and import into Synapse
D — Create a Cosmos DB read replica for analytics
Q4. How can you reduce costs for Dedicated SQL Pool when no queries are running overnight?
A — Delete and recreate the pool each morning
B — Pause the pool — compute billing stops while paused
C — Reduce DWUs to minimum during off-hours
D — Move data to cold storage tier
Q5. When should you use Azure Synapse instead of Azure SQL Database?
A — For a transactional e-commerce application processing thousands of orders per minute
B — For a data warehouse running complex aggregation queries across billions of rows
C — For a standard CRUD web application
D — For storing user session data
Comments
Disclaimer: RedKite Cloud is an independent educational resource and is not affiliated with Microsoft Corporation.