🌐Filtering and Copying Files Dynamically in Azure Data Factory (ADF)
Filtering and Copying Files Dynamically in Azure Data Factory (ADF)
Introduction
In Azure Data Factory (ADF), automating file processing is a common use case. This guide demonstrates how to:
- Retrieve file metadata (including filenames inside a folder).
- Filter files based on today’s date.
- Apply advanced filter conditions such as
contains()
,startswith()
, andendswith()
. - Copy only filtered files dynamically using ForEach and Copy Data activities.
Step 1: Set Variable for Today's Date
The first step is to create a Set Variable activity to store today’s date in ddMMyyyy
format.
Activity: Set Variable
- Variable Name:
dt
- Value:
@formatDateTime(utcNow(), 'ddMMyyyy')
Explanation
utcNow()
fetches the current UTC time.formatDateTime(utcNow(), 'ddMMyyyy')
converts it into a day-month-year format.
Example Output:
{
"name": "dt",
"value": "28022025"
}
Step 2: Get Metadata of Folder Contents
Use the Get Metadata activity to retrieve all file names from a Gen2 storage folder.
Activity: Get Metadata
- Dataset:
ds_getdata
(Pointing to the input folder) - Field List:
childItems
(To retrieve filenames)
Explanation
- The Get Metadata activity fetches metadata details, including file names inside a given folder.
- The
childItems
property stores all files as an array.
Example Output:
{
"childItems": [
{ "name": "28022025adhar", "type": "File" },
{ "name": "28022025asl", "type": "File" },
{ "name": "asl.csv", "type": "File" },
{ "name": "athlete_events.csv", "type": "File" },
{ "name": "bank-full.csv", "type": "File" }
]
}
Step 3: Apply Filtering Conditions
Use Case 1: Filter Files Containing Today’s Date
Filter Condition:
@contains(item().name, variables('dt'))
Explanation
contains()
checks if the filename includes today’s date.- Only files that match the condition are selected.
Filtered Output:
{
"ItemsCount": 5,
"FilteredItemsCount": 2,
"Value": [
{ "name": "28022025adhar", "type": "File" },
{ "name": "28022025asl", "type": "File" }
]
}
Use Case 2: Filter Files Starting with "asl"
Filter Condition:
@startswith(item().name, 'asl')
Explanation
startswith()
checks if a filename starts with a specific prefix (asl
).
Filtered Output:
{
"ItemsCount": 5,
"FilteredItemsCount": 3,
"Value": [
{ "name": "asl_test.csv", "type": "File" },
{ "name": "asl_test28022025.csv", "type": "File" },
{ "name": "asl_today28022025.csv", "type": "File" }
]
}
Use Case 3: Filter Files Ending with .csv
Filter Condition:
@endswith(item().name, 'csv')
Explanation
endswith()
checks if a filename ends with.csv
.
Filtered Output:
{
"ItemsCount": 5,
"FilteredItemsCount": 3,
"Value": [
{ "name": "28022025adhar.csv", "type": "File" },
{ "name": "aadharpancarddata.csv", "type": "File" },
{ "name": "asl_test.csv", "type": "File" }
]
}
Step 4: Copy Filtered Files Using ForEach
If files are found, use a ForEach activity to copy them one by one.
Activity: ForEach
- Sequential Execution: True
- Items:
@activity('Filter1').output.Value
Inside ForEach: Copy Data Activity
Explanation
- The ForEach activity iterates over each filtered file.
- The Copy Data activity copies each file to a new location.
Source Dataset (ds_copyone_byone)
- Dataset Type: Gen2 Delimited Text
- Parameter:
fl
- File Path:
@dataset().fl
- Dynamic Value:
@item().name
Sink Dataset (ds_copyonebyone_sink)
- Dataset Type: Gen2 Delimited Text
- Parameter:
ofl
- File Path:
@dataset().ofl
- Dynamic Value:
@item().name
Final Output: Successfully Copied Files
The pipeline successfully copies only the filtered files based on the applied conditions.
Conclusion
✅ You have successfully implemented dynamic file filtering and copying in ADF! 🎉
Key Takeaways:
- Use
@formatDateTime(utcNow(),'ddMMyyyy')
to store today’s date dynamically. - Use Get Metadata to fetch filenames.
- Use Filter activity with conditions like
contains()
,startswith()
, andendswith()
. - Use ForEach to iterate through the filtered files and copy them.
📢 Have questions or facing issues? Drop them in the comments! 😊
Comments
Post a Comment