🌐 Azure Data Factory Pipeline: Copy SQL Server Tables to Azure Data Lake Gen2

 

Azure Data Factory Pipeline: Copy SQL Server Tables to Azure Data Lake Gen2

This blog provides a step-by-step guide to create an Azure Data Factory (ADF) pipeline that lists all tables from a SQL Server database and copies them one by one into Azure Data Lake Gen2.

Step 1: Create Pipeline with Lookup Activity

  1. Add Lookup Activity

    • In the pipeline canvas, drag and drop the Lookup activity.
  2. Source Dataset

    • Create a New Dataset:
      • Name: sqlserval
      • Linked service: lssql (SQL Server linked service)
  3. Query Settings

    • Use the following query:
      SELECT TABLE_NAME
      FROM information_schema.tables
      WHERE table_type = 'BASE TABLE';
      
    • Uncheck the "First row only" option.
  4. Debug

    • Click Debug to ensure all SQL Server tables are listed in the output.

Step 2: ForEach Activity for Iteration

  1. Add ForEach Activity

    • Drag ForEach activity to the canvas.
    • Connect it to the Lookup activity.
  2. Settings Tab

    • Sequential: Enable this option.
    • Items: Add the dynamic expression:
      @activity('lookup1').output.value
      

Step 3: Inside ForEach - Copy Activity

  1. Add Copy Activity

    • Inside the ForEach activity, drag and drop the Copy activity.
  2. Source Settings

    • Dataset: Create a new SQL Server dataset.
      • Name: anyname_ds
      • Linked service: ls_ddd
    • Parameters: Create a parameter p.
    • Table Value: Use the dynamic content:
      @item().TABLE_NAME
      
  3. Sink Settings

    • Dataset: Create a new Azure Data Lake Gen2 Delimited Text dataset.
      • Name: sink
      • Linked service: Link to Gen2 storage.
      • File Path: out/folder/@dataset().p
    • Call the dataset parameter in the file path:
      @dataset().p
      

Step 4: Create and Assign Variables

  1. Create Variable

    • In the Variables tab, create a variable to store the table name.
  2. Set Variable

    • Use Set Variable activity if needed to capture @item().TABLE_NAME during each iteration.

Step 5: Execute the Pipeline

  1. Trigger the pipeline.
  2. The pipeline will iterate through each table in SQL Server and copy the data into the specified Azure Data Lake Gen2 folder.

Key Dynamic Content Used

  • To get table names from Lookup:
    @activity('lookup1').output.value
    
  • Passing table names dynamically to the dataset:
    @item().TABLE_NAME
    
  • Dynamic file path in Sink:
    @dataset().p
    

Conclusion

Following these steps, you will have a fully functional ADF pipeline that dynamically reads all tables from a SQL Server database and copies them to Azure Data Lake Gen2 one by one. Adjust file paths and dataset parameters based on your project requirements. Happy data engineering!

Comments

Popular posts from this blog

🔥Apache Spark Architecture with RDD & DAG

🌐Filtering and Copying Files Dynamically in Azure Data Factory (ADF)

🌐End-to-End ETL Pipeline: MS SQL to MS SQL Using Azure Databricks