Last updated: May 2026
Azure DatabasesIntermediateAZ-104⏱ 10 min read

Azure Data Factory

Azure Data Factory (ADF) is Azure's cloud ETL and data integration service. It lets you create, schedule, and orchestrate data pipelines — copying data between 90+ sources and destinations, transforming it along the way. Whether you're moving data from on-premises SQL Server to Azure SQL, loading CSV files into a data warehouse, or orchestrating a complex multi-step analytics workflow, ADF is the tool.

What you'll learn ADF core concepts — pipelines, activities, datasets, linked services · Copy Activity — the workhorse of ADF · Data Flows for no-code transformation · Triggers — schedule, event, tumbling window · Integration Runtimes · Monitoring pipelines · ADF vs Synapse Pipelines

Core Concepts

ConceptDescriptionAnalogy
PipelineLogical grouping of activities that perform a taskA recipe with multiple steps
ActivityA processing step within a pipeline (Copy, Transform, Execute)A step in the recipe
DatasetNamed view of the data (schema, location)The ingredients
Linked ServiceConnection string to a data store or computeThe pantry connection
TriggerWhen and how a pipeline runs (schedule, event)The timer or doorbell
Integration RuntimeThe compute infrastructure that runs activitiesThe kitchen (Azure or on-premises)

Copy Activity

The Copy Activity is the most commonly used activity — it copies data from a source to a sink (destination). ADF supports 90+ connectors including:

  • Azure: Blob Storage, Data Lake, SQL Database, Cosmos DB, Synapse
  • On-premises: SQL Server, Oracle, SAP, file systems
  • SaaS: Salesforce, ServiceNow, Dynamics 365, REST APIs
  • Other clouds: Amazon S3, Google Cloud Storage, Snowflake
💡
No Code Required Copy Activity is configured visually in the ADF Studio — no coding needed for most data copy scenarios. You select source, destination, and column mappings through a GUI. For complex transformations, use Data Flows or compute activities (Spark, Databricks, stored procedures).

Data Flows

Mapping Data Flows provide a visual, no-code way to transform data at scale. You design transformation logic in a GUI — ADF compiles it to Spark and runs it on a managed Spark cluster.

Common transformations:

  • Filter, select, derive new columns
  • Join multiple datasets
  • Aggregate (group by, sum, count)
  • Pivot/unpivot
  • Lookup — enrich data from a reference dataset

Triggers

Trigger TypeWhen It FiresUse Case
ScheduleOn a fixed schedule (hourly, daily, cron)Regular batch loads — nightly ETL
Tumbling WindowFixed-size non-overlapping time windows, sequentiallyTime-partitioned loads, retry failed windows
Event-based (Storage)When a file arrives in Blob StorageProcess files as soon as they arrive
Custom EventsAzure Event Grid eventTrigger on any Azure resource event

Integration Runtimes

Integration Runtime (IR) is the compute infrastructure ADF uses to run activities:

IR TypeWhere It RunsUse For
Azure IRManaged Azure cloud computeCloud-to-cloud data movement and transformation
Self-hosted IRYour own on-premises or VMOn-premises data sources (behind firewall)
Azure-SSIS IRManaged SSIS runtime in AzureLifting existing SSIS packages to Azure
ℹ️
Self-Hosted IR for On-Premises If your source data is on-premises behind a firewall, you install the Self-Hosted Integration Runtime on a server in your network. It establishes an outbound connection to ADF — no inbound firewall rules needed.

Monitoring

ADF has built-in monitoring in the ADF Studio:

  • View all pipeline runs — status, duration, errors
  • Drill into individual activity runs
  • Rerun failed pipelines from the point of failure
  • Alert on pipeline failures via Azure Monitor

ADF vs Synapse Pipelines

FactorAzure Data FactorySynapse Pipelines
Standalone service?Yes — independent servicePart of Synapse workspace
FeaturesSame engineSame engine — tightly integrated with Synapse SQL/Spark
Choose whenStandalone ETL, no Synapse workspaceAlready using Synapse for warehousing/analytics
💡
AZ-104 Exam Tip Know ADF is for ETL/data integration (not a database). Know it has 90+ connectors. Know Self-Hosted IR for on-premises connectivity. Know the three trigger types (Schedule, Tumbling Window, Event-based). Know Data Flows compile to Spark for transformation.
📝 Practice Questions
Click an option to check your answer.
Q1. A company needs to copy data from an on-premises SQL Server database to Azure Blob Storage. Which Integration Runtime type does ADF need?
A — Azure IR
B — Self-Hosted IR installed in the on-premises network
C — Azure-SSIS IR
D — VPN Gateway
Q2. What is a Linked Service in Azure Data Factory?
A — A pipeline that links multiple activities together
B — A connection definition containing connection string and authentication for a data store
C — A schema definition for the data structure
D — A scheduled job that triggers pipelines automatically
Q3. Which ADF trigger type fires automatically when a file arrives in Azure Blob Storage?
A — Schedule trigger
B — Tumbling Window trigger
C — Event-based (Storage Events) trigger
D — Manual trigger
Q4. What technology do ADF Mapping Data Flows compile to under the hood?
A — SQL Server stored procedures
B — Apache Spark
C — Azure Functions
D — T-SQL
Q5. How many data source/destination connectors does Azure Data Factory support?
A — 10
B — 30
C — 90+
D — 500+
Comments
Disclaimer: RedKite Cloud is an independent educational resource and is not affiliated with Microsoft Corporation.