Here is your Lambda function with detailed line-by-line explanations using comments:

import json  # Import the JSON module for handling event data (optional)
import boto3  # Import the AWS SDK for Python (Boto3) to interact with AWS services

# Create an AWS Glue client to start Glue jobs
glue_client = boto3.client('glue')

def lambda_handler(event, context):
    """
    AWS Lambda function triggered when a new file is uploaded to an S3 bucket.
    It extracts the bucket name and file name, then starts an AWS Glue job.
    """
    for record in event['Records']:  # Loop through each record in the event
        # Extract the S3 bucket name where the file was uploaded
        bucket_name = record['s3']['bucket']['name']

        # Extract the file name (key) of the uploaded file
        object_key = record['s3']['object']['key']

        # Print the file path for logging/debugging
        print(f"New file added: s3://{bucket_name}/{object_key}")

        # Start the AWS Glue job, passing the bucket name and file name as arguments
        response = glue_client.start_job_run(
            JobName='firstjob',  # Replace with your actual Glue job name
            Arguments={
                '--s3p': bucket_name,  # Pass the S3 bucket name as a parameter to Glue
                '--fn': object_key  # Pass the uploaded file name to Glue
            }
        )
        
        # Print the Glue Job Run ID to confirm that the job has started
        print(f"Glue job started successfully: {response['JobRunId']}")

🔍 Explanation of Each Section

  1. Importing Libraries

    • json: Can be used to handle JSON data (though not directly used here).
    • boto3: AWS SDK to interact with AWS services like Glue and S3.
  2. Creating an AWS Glue Client

    • boto3.client('glue') initializes a connection to the Glue service.
  3. Lambda Handler Function (lambda_handler)

    • This function is triggered when an S3 event occurs (i.e., a new file is uploaded).
  4. Processing S3 Event Data

    • event['Records'] contains a list of records (in case multiple files are uploaded at once).
    • We loop through event['Records'] to handle each uploaded file.
  5. Extracting Bucket and Object Information

    • bucket_name = record['s3']['bucket']['name'] → Gets the bucket name.
    • object_key = record['s3']['object']['key'] → Gets the file name (key).
  6. Triggering AWS Glue Job

    • Calls glue_client.start_job_run(), passing:
      • JobName='firstjob': The name of the AWS Glue job to run.
      • Arguments: Passes S3 details (--s3p and --fn) as parameters.
  7. Logging

    • print(f"New file added: s3://{bucket_name}/{object_key}") → Logs the file path.
    • print(f"Glue job started successfully: {response['JobRunId']}") → Confirms the Glue job has started.

🛠 How It Works?

  1. A file is uploaded to an S3 bucket.
  2. S3 event notification triggers the Lambda function.
  3. The function extracts:
    • The bucket name.
    • The file name.
  4. It then starts an AWS Glue job, passing the S3 file details as arguments.

Would you like me to help you set up the S3 trigger for this Lambda function? 🚀

Comments

Popular posts from this blog

AWS Athena, AWS Lambda, AWS Glue, and Amazon S3 – Detailed Explanation

Kafka Integrated with Spark Structured Streaming

Azure Data Factory: Copying Data from ADLS to MSSQL