🌐Step-by-Step Guide to Creating and Accessing Azure Blob Storage in Databricks

March 04, 2025

Step-by-Step Guide to Creating and Accessing Azure Blob Storage in Databricks

Introduction

Azure Blob Storage is a scalable, cloud-based object storage solution for storing massive amounts of unstructured data. This blog will guide you step by step on how to create a container, generate access keys, mount the storage in Databricks, and process CSV files into an MSSQL database.

Step 1: Create an Azure Storage Account and Container

Before proceeding, you need an Azure Storage Account and a container.

1.1 Create a Storage Account

Log in to the Azure Portal.
Navigate to Storage Accounts.
Click Create and provide the following details:
- Subscription: Choose your subscription.
- Resource Group: Create a new one or use an existing one.
- Storage Account Name: venu28nov2023
- Region: Select your preferred region.
- Performance: Standard.
- Replication: Locally Redundant Storage (LRS) for cost efficiency.
Click Review + Create and wait for deployment.

1.2 Create a Container

Go to your newly created storage account.
Navigate to Containers under Data Storage.
Click + Container, name it input, and select Private (no anonymous access).
Click Create.

Step 2: Generate Access Keys

To access the storage account, retrieve the access key:

Navigate to Storage Account > Access Keys.

Copy one of the keys (Primary or Secondary).

Example:

Accesskeys = "tnd12MEQycM6I4EWg9qjRv3k+pFZL5Op1XwQSE0z5i0gGzX8J0jvh8w7iqq4uO9yAc7RLHa3+cSo+ASt335JdQ=="

Step 3: Mount Azure Blob Storage in Databricks

To mount the container in Databricks, run the following command:

storage_account_name = "venu28nov2023"
container_name = "input"
Accesskeys = "tnd12MEQycM6I4EWg9qjRv3k+pFZL5Op1XwQSE0z5i0gGzX8J0jvh8w7iqq4uO9yAc7RLHa3+cSo+ASt335JdQ=="

dbutils.fs.mount(
  source = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
  mount_point = f"/mnt/iotdata1",
  extra_configs = {f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net":Accesskeys}
)

This mounts the input container to /mnt/iotdata1 in Databricks.

Step 4: Process CSV Data

4.1 Define the Data Location

data="/mnt/wallmart/csvdata"

4.2 List and Filter Files

import re
import os
files = dbutils.fs.ls(data)
files_paths = [f.path for f in files if "us-500" in f.path and f.path.endswith("us-500.csv")]
print(files_paths)

This filters CSV files that contain "us-500" in their name.

4.3 Define MSSQL Connection

host="jdbc:sqlserver://sampleserver111.database.windows.net:1433;databaseName=mssqldb;"
conf = {
    "url": host,
    "user": "chandra",
    "password": "Mypassword.1"
}

4.4 Read CSV and Write to SQL Table

for p in files_paths:
    filename = os.path.splitext(os.path.basename(p))[0]
    tab = re.sub(r'[^a-zA-Z0-9]', '', filename)
    print("Importing data from", p)
    print("Storing in MSSQL table:", tab)
    
    df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(p)
    df.write.mode("overwrite").format("jdbc").options(**conf).option("dbtable", tab).save()
    df.show()

This reads each CSV file and imports it into a dynamically named MSSQL table.

Step 5: Mount Another Storage Account

To mount another storage account (storageaccountqqqq), use:

storage_account_name = "storageaccountqqqq"
container_name = "input"
Accesskeys = "+cT3+AStwi6v7Q=="

dbutils.fs.mount(
  source = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
  mount_point = f"/mnt/wallmart",
  extra_configs = {f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net":Accesskeys}
)

This mounts the input container from the second storage account to /mnt/wallmart.

Conclusion

In this blog, we covered:

Creating an Azure Storage Account and Blob Container.
Retrieving Access Keys.
Mounting Blob Storage in Databricks.
Listing and filtering CSV files.
Importing CSV data into an MSSQL database.

By following these steps, you can effectively store and manage large-scale datasets in Azure and Databricks.

Comments

Chandrakanth5 March 2025 at 00:03
Should have also added enable hierarchy option while creating the storage account. So they can add container inside the container
ReplyDelete
Replies

Add comment

Search This Blog

Data Nexus