๐ Creating an Azure Data Factory Pipeline to Copy Multiple Files to SQL Server
๐ Step-by-Step Guide: Creating an Azure Data Factory Pipeline to Copy Multiple Files to SQL Server ✨
๐ Prerequisites
Before you begin, ensure you have the following resources in place:
-
๐️ Azure Storage Account (Gen2) with:
- One Storage Bucket (Container)
- One Input Folder containing at least one
.csv
file
-
๐ฆ SQL Server Database configured and accessible.
-
⚡ Azure Data Factory (ADF) instance set up in your Azure subscription.
๐ก These components are essential to build and execute the pipeline successfully.
๐ Step-by-Step Process to Create the Required Components:
-
Create Azure Storage Account: Navigate to Azure Portal > Storage Accounts > Create. Provide details and deploy.
-
Create Input Folder: In the storage account, create a container and upload
.csv
files to a folder. -
Set Up SQL Server: Deploy a SQL Server in Azure, create a database, and ensure appropriate firewall settings.
-
Create Azure Data Factory: Navigate to Azure Portal > Azure Data Factory > Create and configure essential properties.
๐ Pipeline Overview
The pipeline includes the following key steps:
- ๐ Activity: Get Metadata
- ๐ Activity: ForEach (to iterate over each file)
- ๐ค Activity: Copy Data (copy each file to SQL Server)
⚙️ Step 1: Create the Pipeline
Step 1.1: Set Up Azure Data Factory
- Go to Azure Portal.
- Search for Azure Data Factory and click Create.
- Provide details like Resource Group, Region, and ADF Name.
- Click Review + Create and then Create after validation. ๐ฏ
Step 1.2: Configure Storage and Database
- ๐ Azure Storage Bucket: Upload multiple
.csv
files to the input folder. - ๐️ SQL Server Database: Ensure SQL Server is running with appropriate permissions.
Step 1.3: Create a New Pipeline
- Open Azure Data Factory Studio.
- Go to the Author tab and click + (New pipeline).
- ✨ Pipeline Name:
pipeline_CopyMultipleFilesToSQL
๐ Step 2: Add Get Metadata Activity
Activity: Get Metadata
Settings:
- Dataset: Create a new dataset for Azure Data Lake Gen2.
- Click + New > Azure Data Lake Storage Gen2 > DelimitedText.
- Name:
ds_list_all_files_folder
๐ - Linked Service: Create/select the linked service pointing to the Gen2 account.
- File Path: Only mention the folder path (do not specify any file because multiple files will be read).
Field List:
- Select child items (to get the list of file names like
filename.csv
).
Run Output Example:
[
{"name": "file1.csv", "type": "File"},
{"name": "file2.csv", "type": "File"}
]
๐ Step 3: Add ForEach Activity
Activity: ForEach
- Drag and drop the ForEach activity into the pipeline.
- Connect the Get Metadata activity to ForEach.
Settings:
- ✅ Enable Sequential execution.
- Items: Click Add dynamic content and select:
@activity('Get Metadata1').output.childItems
๐ค Step 4: Add Copy Data Activity Inside ForEach
- Drag and drop the Copy Data activity inside the ForEach activity.
Copy Data Settings:
Source:
- Dataset: Create a new dataset for Gen2.
- Name:
ds_send_oneby_onefile
๐ฆ - Linked Service: Use the previously created linked service.
- File Path: Mention only the folder path.
- Name:
Open Parameter:
- Create a temporary parameter
t
in the dataset. - In Connection > File Path, add the temporary parameter
t
. - Set the value using dynamic content:
@item().name
Sink:
- Dataset: Create a new SQL Server dataset.
- Name:
ds_1by1file_tosqlserver
๐ฆ - Linked Service: Point to the SQL Server instance.
- Name:
Open Parameter:
- Create a parameter
tab
for the table name. - In Table property, use dynamic content:
@replace(item().name,'.csv', '')
✨ This will automatically create a table in the SQL Server database for each file.
✅ Step 5: Test and Debug the Pipeline
- Click Debug to run the pipeline.
- ๐ Verify that each file from the folder is copied to the SQL Server database in its own table.
๐ก Key Notes:
- ๐ Ensure proper permissions are provided for both Azure Data Lake Gen2 and SQL Server.
- ๐ The ForEach activity ensures that each file is processed sequentially.
- ⚡ Dynamic content ensures flexibility when dealing with multiple files.
๐ Conclusion
In this guide, you learned how to:
- ๐ Use Get Metadata to list files in a folder.
- ๐ Iterate over the list using ForEach.
- ๐ค Dynamically copy files into SQL Server using Copy Data.
✨ This pipeline is highly scalable and can be reused for similar use cases involving multiple file ingestion into SQL Server.
Comments
Post a Comment