Here is your Lambda function with detailed line-by-line explanations using comments:
import json # Import the JSON module for handling event data (optional)
import boto3 # Import the AWS SDK for Python (Boto3) to interact with AWS services
# Create an AWS Glue client to start Glue jobs
glue_client = boto3.client('glue')
def lambda_handler(event, context):
"""
AWS Lambda function triggered when a new file is uploaded to an S3 bucket.
It extracts the bucket name and file name, then starts an AWS Glue job.
"""
for record in event['Records']: # Loop through each record in the event
# Extract the S3 bucket name where the file was uploaded
bucket_name = record['s3']['bucket']['name']
# Extract the file name (key) of the uploaded file
object_key = record['s3']['object']['key']
# Print the file path for logging/debugging
print(f"New file added: s3://{bucket_name}/{object_key}")
# Start the AWS Glue job, passing the bucket name and file name as arguments
response = glue_client.start_job_run(
JobName='firstjob', # Replace with your actual Glue job name
Arguments={
'--s3p': bucket_name, # Pass the S3 bucket name as a parameter to Glue
'--fn': object_key # Pass the uploaded file name to Glue
}
)
# Print the Glue Job Run ID to confirm that the job has started
print(f"Glue job started successfully: {response['JobRunId']}")
🔍 Explanation of Each Section
-
Importing Libraries
json
: Can be used to handle JSON data (though not directly used here).boto3
: AWS SDK to interact with AWS services like Glue and S3.
-
Creating an AWS Glue Client
boto3.client('glue')
initializes a connection to the Glue service.
-
Lambda Handler Function (
lambda_handler
)- This function is triggered when an S3 event occurs (i.e., a new file is uploaded).
-
Processing S3 Event Data
event['Records']
contains a list of records (in case multiple files are uploaded at once).- We loop through
event['Records']
to handle each uploaded file.
-
Extracting Bucket and Object Information
bucket_name = record['s3']['bucket']['name']
→ Gets the bucket name.object_key = record['s3']['object']['key']
→ Gets the file name (key).
-
Triggering AWS Glue Job
- Calls
glue_client.start_job_run()
, passing:JobName='firstjob'
: The name of the AWS Glue job to run.Arguments
: Passes S3 details (--s3p
and--fn
) as parameters.
- Calls
-
Logging
print(f"New file added: s3://{bucket_name}/{object_key}")
→ Logs the file path.print(f"Glue job started successfully: {response['JobRunId']}")
→ Confirms the Glue job has started.
🛠 How It Works?
- A file is uploaded to an S3 bucket.
- S3 event notification triggers the Lambda function.
- The function extracts:
- The bucket name.
- The file name.
- It then starts an AWS Glue job, passing the S3 file details as arguments.
Would you like me to help you set up the S3 trigger for this Lambda function? 🚀
Comments
Post a Comment