Synapse Architecture
Azure Synapse is a unified workspace with multiple compute engines, all connected to a common data lake (Azure Data Lake Storage Gen2):
| Component | Purpose |
|---|---|
| Dedicated SQL Pool | Traditional data warehouse — provisioned MPP compute |
| Serverless SQL Pool | Query files in data lake on-demand with SQL |
| Apache Spark Pool | Big data processing, ML, data transformation with Spark |
| Synapse Pipelines | Data integration and orchestration (like Azure Data Factory) |
| Synapse Link | HTAP — query operational data in Cosmos DB/SQL without ETL |
| Synapse Studio | Unified web IDE for all Synapse capabilities |
Dedicated SQL Pool
The traditional data warehouse component — formerly called Azure SQL Data Warehouse. A massively parallel processing (MPP) SQL engine optimised for petabyte-scale analytics.
Key Concepts
- Data Warehouse Units (DWU) — The billing unit. More DWUs = more compute and memory. Scale up/down as needed.
- Distributions — Data is distributed across 60 nodes. Distribution key choice is critical for performance.
- Columnstore indexes — Data stored column-by-column — excellent for analytical queries scanning many rows
- Pause/Resume — Pause when not needed, stop compute billing. Storage billing continues.
Serverless SQL Pool
Serverless SQL Pool allows you to query files directly in Azure Data Lake Storage (Parquet, CSV, JSON, Delta) using standard SQL — without provisioning any infrastructure. You pay per TB of data scanned.
- No infrastructure to manage
- Query data in place — no need to load into a database first
- Create external tables as views over data lake files
- Pay per TB scanned — ₹5/TB (approximate)
SELECT TOP 100 *
FROM OPENROWSET(
BULK 'https://mystorageaccount.dfs.core.windows.net/data/sales/*.parquet',
FORMAT = 'PARQUET'
) AS sales
WHERE sales.year = 2026;
Apache Spark Pool
Serverless Spark clusters for big data processing, machine learning, and data transformation using Python, Scala, R, or SQL. Auto-scales — only active (and billed) when running jobs.
- Integrated with Azure ML for model training
- Share data with Dedicated SQL Pool (via shared data lake)
- Delta Lake support for ACID transactions on data lake files
Synapse Link
Synapse Link enables Hybrid Transactional Analytical Processing (HTAP) — run analytics directly on operational data without copying it or impacting the transactional database:
- Synapse Link for Cosmos DB — Analytical store mirrors your Cosmos DB data; query it with SQL or Spark
- Synapse Link for SQL — Replicate SQL Database/MI data to Synapse for analytics
Synapse Pipelines
Built-in data integration capability (same visual experience as Azure Data Factory). Orchestrate data movement and transformation — copy data from 90+ sources, transform with data flows or Spark, load into the data warehouse.
Synapse vs SQL Database
| Factor | Synapse Dedicated SQL Pool | Azure SQL Database |
|---|---|---|
| Workload type | Analytics — complex queries on billions of rows | Transactional (OLTP) — many small reads/writes |
| Optimised for | Aggregations, scans, reporting | Point lookups, inserts, updates |
| Storage format | Columnar | Row-based |
| Scaling | DWU-based MPP | vCores/DTUs |
| Max data size | Petabytes | 4 TB |