Last updated: May 2026

Azure DatabasesIntermediateAZ-104⏱ 10 min read

Azure Data Factory

Azure Data Factory (ADF) is Azure's cloud ETL and data integration service. It lets you create, schedule, and orchestrate data pipelines — copying data between 90+ sources and destinations, transforming it along the way. Whether you're moving data from on-premises SQL Server to Azure SQL, loading CSV files into a data warehouse, or orchestrating a complex multi-step analytics workflow, ADF is the tool.

What you'll learn ADF core concepts — pipelines, activities, datasets, linked services · Copy Activity — the workhorse of ADF · Data Flows for no-code transformation · Triggers — schedule, event, tumbling window · Integration Runtimes · Monitoring pipelines · ADF vs Synapse Pipelines

Table of Contents

Core Concepts
Copy Activity
Data Flows
Triggers
Integration Runtimes
Monitoring
ADF vs Synapse Pipelines
Practice Questions

Core Concepts

Concept	Description	Analogy
Pipeline	Logical grouping of activities that perform a task	A recipe with multiple steps
Activity	A processing step within a pipeline (Copy, Transform, Execute)	A step in the recipe
Dataset	Named view of the data (schema, location)	The ingredients
Linked Service	Connection string to a data store or compute	The pantry connection
Trigger	When and how a pipeline runs (schedule, event)	The timer or doorbell
Integration Runtime	The compute infrastructure that runs activities	The kitchen (Azure or on-premises)

Copy Activity

The Copy Activity is the most commonly used activity — it copies data from a source to a sink (destination). ADF supports 90+ connectors including:

Azure: Blob Storage, Data Lake, SQL Database, Cosmos DB, Synapse
On-premises: SQL Server, Oracle, SAP, file systems
SaaS: Salesforce, ServiceNow, Dynamics 365, REST APIs
Other clouds: Amazon S3, Google Cloud Storage, Snowflake

💡

No Code Required Copy Activity is configured visually in the ADF Studio — no coding needed for most data copy scenarios. You select source, destination, and column mappings through a GUI. For complex transformations, use Data Flows or compute activities (Spark, Databricks, stored procedures).

Data Flows

Mapping Data Flows provide a visual, no-code way to transform data at scale. You design transformation logic in a GUI — ADF compiles it to Spark and runs it on a managed Spark cluster.

Common transformations:

Filter, select, derive new columns
Join multiple datasets
Aggregate (group by, sum, count)
Pivot/unpivot
Lookup — enrich data from a reference dataset

Triggers

Trigger Type	When It Fires	Use Case
Schedule	On a fixed schedule (hourly, daily, cron)	Regular batch loads — nightly ETL
Tumbling Window	Fixed-size non-overlapping time windows, sequentially	Time-partitioned loads, retry failed windows
Event-based (Storage)	When a file arrives in Blob Storage	Process files as soon as they arrive
Custom Events	Azure Event Grid event	Trigger on any Azure resource event

Integration Runtimes

Integration Runtime (IR) is the compute infrastructure ADF uses to run activities:

IR Type	Where It Runs	Use For
Azure IR	Managed Azure cloud compute	Cloud-to-cloud data movement and transformation
Self-hosted IR	Your own on-premises or VM	On-premises data sources (behind firewall)
Azure-SSIS IR	Managed SSIS runtime in Azure	Lifting existing SSIS packages to Azure

ℹ️

Self-Hosted IR for On-Premises If your source data is on-premises behind a firewall, you install the Self-Hosted Integration Runtime on a server in your network. It establishes an outbound connection to ADF — no inbound firewall rules needed.

Monitoring

ADF has built-in monitoring in the ADF Studio:

View all pipeline runs — status, duration, errors
Drill into individual activity runs
Rerun failed pipelines from the point of failure
Alert on pipeline failures via Azure Monitor

ADF vs Synapse Pipelines

Factor	Azure Data Factory	Synapse Pipelines
Standalone service?	Yes — independent service	Part of Synapse workspace
Features	Same engine	Same engine — tightly integrated with Synapse SQL/Spark
Choose when	Standalone ETL, no Synapse workspace	Already using Synapse for warehousing/analytics

💡

AZ-104 Exam Tip Know ADF is for ETL/data integration (not a database). Know it has 90+ connectors. Know Self-Hosted IR for on-premises connectivity. Know the three trigger types (Schedule, Tumbling Window, Event-based). Know Data Flows compile to Spark for transformation.

📝 Practice Questions

Click an option to check your answer.

Q1. A company needs to copy data from an on-premises SQL Server database to Azure Blob Storage. Which Integration Runtime type does ADF need?

A — Azure IR

B — Self-Hosted IR installed in the on-premises network

C — Azure-SSIS IR

D — VPN Gateway

Q2. What is a Linked Service in Azure Data Factory?

A — A pipeline that links multiple activities together

B — A connection definition containing connection string and authentication for a data store

C — A schema definition for the data structure

D — A scheduled job that triggers pipelines automatically

Q3. Which ADF trigger type fires automatically when a file arrives in Azure Blob Storage?

A — Schedule trigger

B — Tumbling Window trigger

C — Event-based (Storage Events) trigger

D — Manual trigger

Q4. What technology do ADF Mapping Data Flows compile to under the hood?

A — SQL Server stored procedures

B — Apache Spark

C — Azure Functions

D — T-SQL

Q5. How many data source/destination connectors does Azure Data Factory support?

A — 10

B — 30

C — 90+

D — 500+

← PreviousAzure Synapse Analytics Next →Database Migration Service

Comments

Disclaimer: RedKite Cloud is an independent educational resource and is not affiliated with Microsoft Corporation.