How to maintain consistency while uploading to AWS S3

In how to maintain consistency while uploading to AWS S3, We are talking about amazon web services storage service, popular by the name S3. We can upload file to it and retrieve it when ever required. It is also used by many big companies to serve their static files like CSS and JS.

How to maintain consistency while uploading to AWS S3

 

How to maintain consistency while uploading to AWS S3

AWS S3 provides apis to access file and upload them to the server. For using the apis you can use BOTO client in python which is used by most to interact with AWS apis.

I am not going to tell you how to write code to upload or download. In this article what i am going to discuss is how to maintain consistency of the upload on AWS. For this purpose we are going to use the checksum.

What are checksum:

Checksum are some hash which is generated by passing the whole file through a function. If the content of the file is not changed the generated checksum will remain always the same. 

So what we are going to do here is calculate the checksum in local and after uploading the file compare this checksum with the uploaded file. If it is equal we can say that the file is consistent and successfully uploaded on AWS S3.

 

import hashlib
from boto.s3.connection import S3Connection, Key
# Function to calculate the checksum of a local file

def find_checksum(file_name):

    try:

        checksum = hashlib.md5(open(file_name).read()).hexdigest()

        return checksum

    except Exception:

        return False







def compare_checksum(file_name):

    aws_access_key = 'AWS_ACCESS_KEY'

    aws_secret_key = 'AWS_SECRET_KEY'

    s3_bucket = 'S3_ARCHIVE_BUCKET'

    conn = S3Connection(aws_access_key, aws_secret_key)

    try:

        bucket = conn.get_bucket(s3_bucket)

        s3_checksum = bucket.get_key(file_name).etag[1:-1]

        local_checksum = find_checksum(file_name)

        if s3_checksum == local_checksum:

            conn.close()

            return True

        else:

            conn.close()

            return False

    except Exception:

        conn.close()

        return False


There are two functions in the above code the first function find_checksum is the function which will calculate the checksum of the local file.

We are using the etags attribute of boto  s3 key object. This etags contains the checksum in string format something like below

“sadadkjahkdhsalkjfh2uiry398649236br287r2” Now that we don’t want  “  so we remove them to compare with below line

bucket.get_key(file_name).etag[1:-1]

After comparing we can be sure the file is consistence or we can try to upload the file again. This is how to maintain consistency while uploading to AWS S3.

Liked the article please share and subscribe.


Gaurav Yadav

Gaurav is cloud infrastructure engineer and a full stack web developer and blogger. Sportsperson by heart and loves football. Scale is something he loves to work for and always keen to learn new tech. Experienced with CI/CD, distributed cloud infrastructure, build systems and lot of SRE Stuff.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.