๐ŸŒ Creating an Azure Data Factory Pipeline to Copy Multiple Files to SQL Server

๐Ÿš€ Step-by-Step Guide: Creating an Azure Data Factory Pipeline to Copy Multiple Files to SQL Server

๐Ÿ“‹ Prerequisites

Before you begin, ensure you have the following resources in place:

  1. ๐Ÿ—‚️ Azure Storage Account (Gen2) with:

    • One Storage Bucket (Container)
    • One Input Folder containing at least one .csv file
  2. ๐Ÿฆ SQL Server Database configured and accessible.

  3. Azure Data Factory (ADF) instance set up in your Azure subscription.

pls refer periovs blog also for creating azure factory and sql serval database

๐Ÿ’ก These components are essential to build and execute the pipeline successfully.


๐Ÿ“Œ Step-by-Step Process to Create the Required Components:

  1. Create Azure Storage Account: Navigate to Azure Portal > Storage Accounts > Create. Provide details and deploy.

  2. Create Input Folder: In the storage account, create a container and upload .csv files to a folder.

  3. Set Up SQL Server: Deploy a SQL Server in Azure, create a database, and ensure appropriate firewall settings.

  4. Create Azure Data Factory: Navigate to Azure Portal > Azure Data Factory > Create and configure essential properties.


๐Ÿ“ Pipeline Overview

The pipeline includes the following key steps:

  1. ๐Ÿ”Ž Activity: Get Metadata
  2. ๐Ÿ”„ Activity: ForEach (to iterate over each file)
  3. ๐Ÿ“ค Activity: Copy Data (copy each file to SQL Server)

⚙️ Step 1: Create the Pipeline

Step 1.1: Set Up Azure Data Factory

  1. Go to Azure Portal.
  2. Search for Azure Data Factory and click Create.
  3. Provide details like Resource Group, Region, and ADF Name.
  4. Click Review + Create and then Create after validation. ๐ŸŽฏ

Step 1.2: Configure Storage and Database

  1. ๐Ÿ“‚ Azure Storage Bucket: Upload multiple .csv files to the input folder.
  2. ๐Ÿ›️ SQL Server Database: Ensure SQL Server is running with appropriate permissions.

Step 1.3: Create a New Pipeline

  1. Open Azure Data Factory Studio.
  2. Go to the Author tab and click + (New pipeline).
  3. Pipeline Name: pipeline_CopyMultipleFilesToSQL

๐Ÿ“‚ Step 2: Add Get Metadata Activity

Activity: Get Metadata

Settings:

  1. Dataset: Create a new dataset for Azure Data Lake Gen2.
    • Click + New > Azure Data Lake Storage Gen2 > DelimitedText.
    • Name: ds_list_all_files_folder ๐Ÿ“
    • Linked Service: Create/select the linked service pointing to the Gen2 account.
    • File Path: Only mention the folder path (do not specify any file because multiple files will be read).

Field List:

  • Select child items (to get the list of file names like filename.csv).

Run Output Example:

[
  {"name": "file1.csv", "type": "File"},
  {"name": "file2.csv", "type": "File"}
]

๐Ÿ”„ Step 3: Add ForEach Activity

Activity: ForEach

  1. Drag and drop the ForEach activity into the pipeline.
  2. Connect the Get Metadata activity to ForEach.

Settings:

  • ✅ Enable Sequential execution.
  • Items: Click Add dynamic content and select:
@activity('Get Metadata1').output.childItems

๐Ÿ“ค Step 4: Add Copy Data Activity Inside ForEach

  1. Drag and drop the Copy Data activity inside the ForEach activity.

Copy Data Settings:

Source:

  1. Dataset: Create a new dataset for Gen2.
    • Name: ds_send_oneby_onefile ๐Ÿ“ฆ
    • Linked Service: Use the previously created linked service.
    • File Path: Mention only the folder path.

Open Parameter:

  • Create a temporary parameter t in the dataset.
  • In Connection > File Path, add the temporary parameter t.
  • Set the value using dynamic content:
@item().name

Sink:

  1. Dataset: Create a new SQL Server dataset.
    • Name: ds_1by1file_tosqlserver ๐Ÿฆ
    • Linked Service: Point to the SQL Server instance.

Open Parameter:

  • Create a parameter tab for the table name.
  • In Table property, use dynamic content:
@replace(item().name,'.csv', '')

This will automatically create a table in the SQL Server database for each file.


✅ Step 5: Test and Debug the Pipeline

  1. Click Debug to run the pipeline.
  2. ๐Ÿ” Verify that each file from the folder is copied to the SQL Server database in its own table.

๐Ÿ’ก Key Notes:

  • ๐Ÿ”’ Ensure proper permissions are provided for both Azure Data Lake Gen2 and SQL Server.
  • ๐Ÿ” The ForEach activity ensures that each file is processed sequentially.
  • ⚡ Dynamic content ensures flexibility when dealing with multiple files.

๐ŸŽ‰ Conclusion

In this guide, you learned how to:

  • ๐Ÿ” Use Get Metadata to list files in a folder.
  • ๐Ÿ”„ Iterate over the list using ForEach.
  • ๐Ÿ“ค Dynamically copy files into SQL Server using Copy Data.

This pipeline is highly scalable and can be reused for similar use cases involving multiple file ingestion into SQL Server. 

Comments

Popular posts from this blog

๐Ÿ”ฅApache Spark Architecture with RDD & DAG

๐ŸŒFiltering and Copying Files Dynamically in Azure Data Factory (ADF)

๐ŸŒEnd-to-End ETL Pipeline: MS SQL to MS SQL Using Azure Databricks