Last updated: May 2026
Azure DatabasesIntermediateAZ-104⏱ 10 min read
Azure Data Factory
Azure Data Factory (ADF) is Azure's cloud ETL and data integration service. It lets you create, schedule, and orchestrate data pipelines — copying data between 90+ sources and destinations, transforming it along the way. Whether you're moving data from on-premises SQL Server to Azure SQL, loading CSV files into a data warehouse, or orchestrating a complex multi-step analytics workflow, ADF is the tool.
What you'll learn ADF core concepts — pipelines, activities, datasets, linked services · Copy Activity — the workhorse of ADF · Data Flows for no-code transformation · Triggers — schedule, event, tumbling window · Integration Runtimes · Monitoring pipelines · ADF vs Synapse Pipelines
Core Concepts
| Concept | Description | Analogy |
| Pipeline | Logical grouping of activities that perform a task | A recipe with multiple steps |
| Activity | A processing step within a pipeline (Copy, Transform, Execute) | A step in the recipe |
| Dataset | Named view of the data (schema, location) | The ingredients |
| Linked Service | Connection string to a data store or compute | The pantry connection |
| Trigger | When and how a pipeline runs (schedule, event) | The timer or doorbell |
| Integration Runtime | The compute infrastructure that runs activities | The kitchen (Azure or on-premises) |
Copy Activity
The Copy Activity is the most commonly used activity — it copies data from a source to a sink (destination). ADF supports 90+ connectors including:
- Azure: Blob Storage, Data Lake, SQL Database, Cosmos DB, Synapse
- On-premises: SQL Server, Oracle, SAP, file systems
- SaaS: Salesforce, ServiceNow, Dynamics 365, REST APIs
- Other clouds: Amazon S3, Google Cloud Storage, Snowflake
💡
No Code Required Copy Activity is configured visually in the ADF Studio — no coding needed for most data copy scenarios. You select source, destination, and column mappings through a GUI. For complex transformations, use Data Flows or compute activities (Spark, Databricks, stored procedures).
Data Flows
Mapping Data Flows provide a visual, no-code way to transform data at scale. You design transformation logic in a GUI — ADF compiles it to Spark and runs it on a managed Spark cluster.
Common transformations:
- Filter, select, derive new columns
- Join multiple datasets
- Aggregate (group by, sum, count)
- Pivot/unpivot
- Lookup — enrich data from a reference dataset
Triggers
| Trigger Type | When It Fires | Use Case |
| Schedule | On a fixed schedule (hourly, daily, cron) | Regular batch loads — nightly ETL |
| Tumbling Window | Fixed-size non-overlapping time windows, sequentially | Time-partitioned loads, retry failed windows |
| Event-based (Storage) | When a file arrives in Blob Storage | Process files as soon as they arrive |
| Custom Events | Azure Event Grid event | Trigger on any Azure resource event |
Integration Runtimes
Integration Runtime (IR) is the compute infrastructure ADF uses to run activities:
| IR Type | Where It Runs | Use For |
| Azure IR | Managed Azure cloud compute | Cloud-to-cloud data movement and transformation |
| Self-hosted IR | Your own on-premises or VM | On-premises data sources (behind firewall) |
| Azure-SSIS IR | Managed SSIS runtime in Azure | Lifting existing SSIS packages to Azure |
ℹ️
Self-Hosted IR for On-Premises If your source data is on-premises behind a firewall, you install the Self-Hosted Integration Runtime on a server in your network. It establishes an outbound connection to ADF — no inbound firewall rules needed.
Monitoring
ADF has built-in monitoring in the ADF Studio:
- View all pipeline runs — status, duration, errors
- Drill into individual activity runs
- Rerun failed pipelines from the point of failure
- Alert on pipeline failures via Azure Monitor
ADF vs Synapse Pipelines
| Factor | Azure Data Factory | Synapse Pipelines |
| Standalone service? | Yes — independent service | Part of Synapse workspace |
| Features | Same engine | Same engine — tightly integrated with Synapse SQL/Spark |
| Choose when | Standalone ETL, no Synapse workspace | Already using Synapse for warehousing/analytics |
💡
AZ-104 Exam Tip Know ADF is for ETL/data integration (not a database). Know it has 90+ connectors. Know Self-Hosted IR for on-premises connectivity. Know the three trigger types (Schedule, Tumbling Window, Event-based). Know Data Flows compile to Spark for transformation.
Click an option to check your answer.
Q1. A company needs to copy data from an on-premises SQL Server database to Azure Blob Storage. Which Integration Runtime type does ADF need?
A — Azure IR
B — Self-Hosted IR installed in the on-premises network
C — Azure-SSIS IR
D — VPN Gateway
Q2. What is a Linked Service in Azure Data Factory?
A — A pipeline that links multiple activities together
B — A connection definition containing connection string and authentication for a data store
C — A schema definition for the data structure
D — A scheduled job that triggers pipelines automatically
Q3. Which ADF trigger type fires automatically when a file arrives in Azure Blob Storage?
A — Schedule trigger
B — Tumbling Window trigger
C — Event-based (Storage Events) trigger
D — Manual trigger
Q4. What technology do ADF Mapping Data Flows compile to under the hood?
A — SQL Server stored procedures
B — Apache Spark
C — Azure Functions
D — T-SQL
Q5. How many data source/destination connectors does Azure Data Factory support?
A — 10
B — 30
C — 90+
D — 500+
Disclaimer: RedKite Cloud is an independent educational resource and is not affiliated with Microsoft Corporation.