🌐Filtering and Copying Files Dynamically in Azure Data Factory (ADF)

 

Filtering and Copying Files Dynamically in Azure Data Factory (ADF)

Introduction

In Azure Data Factory (ADF), automating file processing is a common use case. This guide demonstrates how to:

  • Retrieve file metadata (including filenames inside a folder).
  • Filter files based on today’s date.
  • Apply advanced filter conditions such as contains(), startswith(), and endswith().
  • Copy only filtered files dynamically using ForEach and Copy Data activities.

Step 1: Set Variable for Today's Date

The first step is to create a Set Variable activity to store today’s date in ddMMyyyy format.

Activity: Set Variable

  • Variable Name: dt
  • Value: @formatDateTime(utcNow(), 'ddMMyyyy')

Explanation

  • utcNow() fetches the current UTC time.
  • formatDateTime(utcNow(), 'ddMMyyyy') converts it into a day-month-year format.

Example Output:

{
    "name": "dt",
    "value": "28022025"
}

Step 2: Get Metadata of Folder Contents

Use the Get Metadata activity to retrieve all file names from a Gen2 storage folder.

Activity: Get Metadata

  • Dataset: ds_getdata (Pointing to the input folder)
  • Field List: childItems (To retrieve filenames)

Explanation

  • The Get Metadata activity fetches metadata details, including file names inside a given folder.
  • The childItems property stores all files as an array.

Example Output:

{
    "childItems": [
        { "name": "28022025adhar", "type": "File" },
        { "name": "28022025asl", "type": "File" },
        { "name": "asl.csv", "type": "File" },
        { "name": "athlete_events.csv", "type": "File" },
        { "name": "bank-full.csv", "type": "File" }
    ]
}

Step 3: Apply Filtering Conditions

Use Case 1: Filter Files Containing Today’s Date

Filter Condition:

@contains(item().name, variables('dt'))

Explanation

  • contains() checks if the filename includes today’s date.
  • Only files that match the condition are selected.

Filtered Output:

{
    "ItemsCount": 5,
    "FilteredItemsCount": 2,
    "Value": [
        { "name": "28022025adhar", "type": "File" },
        { "name": "28022025asl", "type": "File" }
    ]
}

Use Case 2: Filter Files Starting with "asl"

Filter Condition:

@startswith(item().name, 'asl')

Explanation

  • startswith() checks if a filename starts with a specific prefix (asl).

Filtered Output:

{
    "ItemsCount": 5,
    "FilteredItemsCount": 3,
    "Value": [
        { "name": "asl_test.csv", "type": "File" },
        { "name": "asl_test28022025.csv", "type": "File" },
        { "name": "asl_today28022025.csv", "type": "File" }
    ]
}

Use Case 3: Filter Files Ending with .csv

Filter Condition:

@endswith(item().name, 'csv')

Explanation

  • endswith() checks if a filename ends with .csv.

Filtered Output:

{
    "ItemsCount": 5,
    "FilteredItemsCount": 3,
    "Value": [
        { "name": "28022025adhar.csv", "type": "File" },
        { "name": "aadharpancarddata.csv", "type": "File" },
        { "name": "asl_test.csv", "type": "File" }
    ]
}

Step 4: Copy Filtered Files Using ForEach

If files are found, use a ForEach activity to copy them one by one.

Activity: ForEach

  • Sequential Execution: True
  • Items: @activity('Filter1').output.Value

Inside ForEach: Copy Data Activity

Explanation

  • The ForEach activity iterates over each filtered file.
  • The Copy Data activity copies each file to a new location.

Source Dataset (ds_copyone_byone)

  • Dataset Type: Gen2 Delimited Text
  • Parameter: fl
  • File Path: @dataset().fl
  • Dynamic Value: @item().name

Sink Dataset (ds_copyonebyone_sink)

  • Dataset Type: Gen2 Delimited Text
  • Parameter: ofl
  • File Path: @dataset().ofl
  • Dynamic Value: @item().name

Final Output: Successfully Copied Files

The pipeline successfully copies only the filtered files based on the applied conditions.




Conclusion

✅ You have successfully implemented dynamic file filtering and copying in ADF! 🎉

Key Takeaways:

  • Use @formatDateTime(utcNow(),'ddMMyyyy') to store today’s date dynamically.
  • Use Get Metadata to fetch filenames.
  • Use Filter activity with conditions like contains(), startswith(), and endswith().
  • Use ForEach to iterate through the filtered files and copy them.

📢 Have questions or facing issues? Drop them in the comments! 😊

Comments

Popular posts from this blog

🔥Apache Spark Architecture with RDD & DAG

🌐End-to-End ETL Pipeline: MS SQL to MS SQL Using Azure Databricks