🌐Azure Data Factory: Copying Data from ADLS to MSSQL
🚀 Azure Data Factory: Copying Data from ADLS to MSSQL
🔑 Key Vault & Initial Setup
To copy data from Azure Data Lake Storage (ADLS) to Microsoft SQL Server (MSSQL) using Azure Data Factory (ADF), follow these structured steps.
🌐 Step 1: Creating Azure Data Factory (ADF) & Linking Services
🏗️ Creating an Azure Data Factory
- 🔍 Search for "Data Factory" in the Azure portal.
- ➕ Click "Create Data Factory".
- 📦 Select the Free Tier.
- 📂 Attach the resource group: rock2025.
- 🏷️ Provide a name, e.g., learnazure (lowercase, no special characters or numbers).
- ✅ Click "Review + Create" and proceed with deployment.
🔗 Linking Azure Data Lake Storage to Data Factory
- Navigate to Azure Data Factory → Launch ADF Studio.
- In the Manage section:
- Click Linked Services → + New Linked Service.
- Select Azure Data Lake Storage Gen2.
- Provide a name, e.g., LS_FEB13.
- Choose the subscription (Free Tier).
- Select the Storage Account (e.g., divyabucket02).
- ✅ Click Create.
🏛️ Step 2: Creating an SQL Database
🗄️ Creating a Database in Azure
- 🔍 Search for SQL Database in Azure.
- ➕ Click + Create Database.
- Configure the following settings:
- Subscription: Free Tier
- Resource Group: divyalearn_az
- Server Name: mysql13febdb
- Region: Asia Pacific (Central India)
- Authentication Method: SQL Authentication
- Server Admin Login: divya
- Password: (Set your own password)
- ✅ Click Review + Create.
Here’s a quick breakdown of the options to help you choose the right Azure Backup Storage Redundancy type:
1. Locally-Redundant Storage (LRS)
- What it is: Data is replicated 3 times within a single datacenter in the same region.
- Use case:
- Low-cost option.
- Suitable for non-critical data where data loss in case of a datacenter failure is acceptable.
- Limitation: Not protected against datacenter-level failures.
2. Zone-Redundant Storage (ZRS)
- What it is: Data is replicated across multiple availability zones (data centers) in the same region.
- Use case:
- Higher availability within the same region.
- Suitable for applications needing high availability but not requiring region-level protection.
- Limitation: Doesn’t protect against entire region outages.
3. Geo-Redundant Storage (GRS)
- What it is: Data is replicated across regions (primary + secondary region, hundreds of miles apart).
- Use case:
- Suitable for critical applications where data recovery after regional disasters is essential.
- Provides disaster recovery capabilities.
- Limitation: More expensive than LRS and ZRS.
4. Geo-Zone-Redundant Storage (GZRS) [Preview]
- What it is: Combines ZRS and GRS by replicating data across availability zones and geographically to a secondary region.
- Use case:
- Highest durability and availability, covering datacenter, zone, and regional failures.
- Ideal for mission-critical workloads.
- Limitation: Likely the most expensive option.
✅ Which should you choose as a student?
If you’re learning or working on non-production projects:
- Choose: 🔹 Locally-Redundant Storage (LRS) — It’s cost-effective and sufficient for learning purposes.
- If you want to explore high availability concepts:
- Zone-Redundant Storage (ZRS) could be a next step.
Would you like more real-world scenarios or examples for each?
🔗 Connecting SQL Workbench to the Database(locally in own system)
- Copy Connection String from Azure SQL Database.
- Open SQL Workbench → File → New Connection.
- Paste the JDBC connection string:
jdbc:sqlserver://divyamysqldb.database.windows.net:1433; database=mysql13febdb; user=divya@divyamysqldb; password={your_password_here}; encrypt=true; trustServerCertificate=false;
- Click Connect.
🔧 Troubleshooting SQL Connection Issues
- If unable to connect:
- Navigate to SQL Database → Networking → Firewall Rules.
- Add a new rule:
- Rule Name: AllAccess
- Start IP: 0.0.0.0
- End IP: 0.0.0.0 (Allows global access, not recommended for production).
- ✅ Save and retry connection.
📤 Step 3: Copying Data from ADLS to SQL Database
📑 Preparing the Table in SQL Workbench
- Run the following SQL command:
CREATE TABLE asltab ( name VARCHAR(32), age INT, city VARCHAR(32) );
- Verify table creation:
SELECT * FROM asltab;
🔗 Creating & Configuring Linked Services in ADF
- Navigate to Azure Data Factory (ADF) → Manage.
- Click Linked Services → + New.
- Select SQL Server and configure:
- Name: LS_SqlServer1
- Server Name: divyamysqldb.database.windows.net
- Database: mysql13febdb
- Authentication Type: SQL Authentication
- Username & Password: (Same as SQL Workbench connection)
- ✅ Click Create.
📂 Creating a Dataset in ADF
- Navigate to ADF → Author → Datasets → + New.
- Select Azure Data Lake Storage Gen2.
- set propertites:
- Format: Delimited Text
- Name: ds_asldata
- Linked Service: LS_FEB13
- File Path:
input/asldata
- ✅ Click OK.
🏗️ Creating a Pipeline in ADF
- Navigate to ADF → Author → Pipelines → + New.
- Drag & drop Copy Activity.
- Configure Source:
- Dataset: ds_asldata
- Configure Sink:
- Dataset: (Create New) → SQL Server
- Linked Service: LS_SqlServer1
- Table Name: asltab
- ✅ Save and Publish the pipeline.
⏩ Executing the Pipeline
- Click Debug to run the pipeline.
- go to SQL workbench local system delete asltab then perform query select *from asltab we get all information that means our pipeline work . Before we i have only column names only no data now we got all data.
🎯 Conclusion
Following these steps, you've successfully copied data from Azure Data Lake Storage (ADLS) to SQL Server using Azure Data Factory (ADF). 🚀
stay connected! 😊
Comments
Post a Comment