🌐Azure Data Factory: Copying Data from ADLS to MSSQL

🚀 Azure Data Factory: Copying Data from ADLS to MSSQL

🔑 Key Vault & Initial Setup

To copy data from Azure Data Lake Storage (ADLS) to Microsoft SQL Server (MSSQL) using Azure Data Factory (ADF), follow these structured steps.


🌐 Step 1: Creating Azure Data Factory (ADF) & Linking Services

🏗️ Creating an Azure Data Factory

  1. 🔍 Search for "Data Factory" in the Azure portal.
  2. ➕ Click "Create Data Factory".
  3. 📦 Select the Free Tier.
  4. 📂 Attach the resource group: rock2025.
  5. 🏷️ Provide a name, e.g., learnazure (lowercase, no special characters or numbers).
  6. ✅ Click "Review + Create" and proceed with deployment.

🔗 Linking Azure Data Lake Storage to Data Factory

  1. Navigate to Azure Data Factory → Launch ADF Studio.
  2. In the Manage section:
    • Click Linked Services+ New Linked Service.
    • Select Azure Data Lake Storage Gen2.
    • Provide a name, e.g., LS_FEB13.
    • Choose the subscription (Free Tier).
    • Select the Storage Account (e.g., divyabucket02).
  3. ✅ Click Create.

🏛️ Step 2: Creating an SQL Database

🗄️ Creating a Database in Azure

  1. 🔍 Search for SQL Database in Azure.
  2. ➕ Click + Create Database.
  3. Configure the following settings:
    • Subscription: Free Tier
    • Resource Group: divyalearn_az
    • Server Name: mysql13febdb
    • Region: Asia Pacific (Central India)
    • Authentication Method: SQL Authentication
    • Server Admin Login: divya
    • Password: (Set your own password)
  4. ✅ Click Review + Create.

Here’s a quick breakdown of the options to help you choose the right Azure Backup Storage Redundancy type:

1. Locally-Redundant Storage (LRS)

  • What it is: Data is replicated 3 times within a single datacenter in the same region.
  • Use case:
    • Low-cost option.
    • Suitable for non-critical data where data loss in case of a datacenter failure is acceptable.
  • Limitation: Not protected against datacenter-level failures.

2. Zone-Redundant Storage (ZRS)

  • What it is: Data is replicated across multiple availability zones (data centers) in the same region.
  • Use case:
    • Higher availability within the same region.
    • Suitable for applications needing high availability but not requiring region-level protection.
  • Limitation: Doesn’t protect against entire region outages.

3. Geo-Redundant Storage (GRS)

  • What it is: Data is replicated across regions (primary + secondary region, hundreds of miles apart).
  • Use case:
    • Suitable for critical applications where data recovery after regional disasters is essential.
    • Provides disaster recovery capabilities.
  • Limitation: More expensive than LRS and ZRS.

4. Geo-Zone-Redundant Storage (GZRS) [Preview]

  • What it is: Combines ZRS and GRS by replicating data across availability zones and geographically to a secondary region.
  • Use case:
    • Highest durability and availability, covering datacenter, zone, and regional failures.
    • Ideal for mission-critical workloads.
  • Limitation: Likely the most expensive option.

Which should you choose as a student?

If you’re learning or working on non-production projects:

  • Choose: 🔹 Locally-Redundant Storage (LRS) — It’s cost-effective and sufficient for learning purposes.
  • If you want to explore high availability concepts:
    • Zone-Redundant Storage (ZRS) could be a next step.

Would you like more real-world scenarios or examples for each?

🔗 Connecting SQL Workbench to the Database(locally in own system)

  1. Copy Connection String from Azure SQL Database.
  2. Open SQL WorkbenchFileNew Connection.
  3. Paste the JDBC connection string:
    jdbc:sqlserver://divyamysqldb.database.windows.net:1433;
    database=mysql13febdb;
    user=divya@divyamysqldb;
    password={your_password_here};
    encrypt=true;
    trustServerCertificate=false;
    
  4. Click Connect.

🔧 Troubleshooting SQL Connection Issues

  • If unable to connect:
    1. Navigate to SQL Database → Networking → Firewall Rules.
    2. Add a new rule:
      • Rule Name: AllAccess
      • Start IP: 0.0.0.0
      • End IP: 0.0.0.0 (Allows global access, not recommended for production).
    3. ✅ Save and retry connection.

📤 Step 3: Copying Data from ADLS to SQL Database

📑 Preparing the Table in SQL Workbench

  1. Run the following SQL command:
    CREATE TABLE asltab (
        name VARCHAR(32),
        age INT,
        city VARCHAR(32)
    );
    
  2. Verify table creation:
    SELECT * FROM asltab;
    

🔗 Creating & Configuring Linked Services in ADF

  1. Navigate to Azure Data Factory (ADF) → Manage.
  2. Click Linked Services → + New.
  3. Select SQL Server and configure:
    • Name: LS_SqlServer1
    • Server Name: divyamysqldb.database.windows.net
    • Database: mysql13febdb
    • Authentication Type: SQL Authentication
    • Username & Password: (Same as SQL Workbench connection)
  4. ✅ Click Create.

📂 Creating a Dataset in ADF

  1. Navigate to ADF → Author → Datasets → + New.
  2. Select Azure Data Lake Storage Gen2.
  3. set propertites:
    • Format: Delimited Text
    • Name: ds_asldata
    • Linked Service: LS_FEB13
    • File Path: input/asldata
  4. ✅ Click OK.

🏗️ Creating a Pipeline in ADF

  1. Navigate to ADF → Author → Pipelines → + New.
  2. Drag & drop Copy Activity.
  3. Configure Source:
    • Dataset: ds_asldata
  4. Configure Sink:
    • Dataset: (Create New) → SQL Server
    • Linked Service: LS_SqlServer1
    • Table Name: asltab
  5. ✅ Save and Publish the pipeline.

⏩ Executing the Pipeline

  1. Click Debug to run the pipeline.
  2. go to SQL workbench local system delete asltab then perform query select *from asltab we get all information that means our pipeline work . Before we i have only column names only no data now we got all data.

🎯 Conclusion

Following these steps, you've successfully copied data from Azure Data Lake Storage (ADLS) to SQL Server using Azure Data Factory (ADF). 🚀

stay connected! 😊

Comments

Popular posts from this blog

🔥Apache Spark Architecture with RDD & DAG

🌐Filtering and Copying Files Dynamically in Azure Data Factory (ADF)

🌐End-to-End ETL Pipeline: MS SQL to MS SQL Using Azure Databricks