Log parsing in python. Read how you can do it.

While working with Nginx or any other server there is sometimes the need to parse the logs and see the consolidated view. This view can help you with identifying the things that are wrong like too many 5xx. Let’s have a look at log parsing in python.

Log parsing in python.

In this article, we will take nginx default access logs for example and try to parse and get the information data out of it.

First of all, let’s look at the logline and identify the pattern it is following. Generally, all the logs follow a pattern and we will try to find that here. Below is one of the log lines.

27.59.104.166 - - [04/Oct/2019:21:15:54 +0000] "GET /users/login HTTP/1.1" 200 41716 "-" "okhttp/3.12.1"

If we look at the above logline we can say that the first part is IP address then we get date time and after that, we have the method of the call and status after that like below.

IP_ADDRESS - - [DATETIME] "METHOD /users/login HTTP/1.1" STATUS_CODE 41716 "-" "okhttp/3.12.1"

In this example of log parsing, we will try to get the number of logs we get each minute for different status codes.

parsed_data = []
  
with open("example.logs","r") as file:
    prev_time = ""
    data = {}
    for line in file:
        time = line.split("[")[1].split("]")[0].split(" ")[0]
        status_code = line.split('"')[2].split(" ")[1]
        if prev_time != "":
            if time == prev_time:
                data[time]["count"] = data[time]["count"] + 1
                if status_code in data[time]:
                    data[time][status_code] = data[time][status_code] + 1
                else:
                    data[time][status_code] = 1
            else:
                prev_time = time
                parsed_data.append(data)
                data = {}
                data[time] = {"count": 1, status_code: 1}
        else:
            prev_time = time
            data[time] = {"count": 1, status_code: 1}

for i in parsed_data:
    print(i)

Let’s dive deep into code.

If we start looking we opened a file and try to read the file line by line, below is the part which is trying to do it.

with open("example.logs","r") as file:
    prev_time = ""
    data = {}
    for line in file:

After that in each line, we try to find time and status_code. After that is simple logic of keeping track of time which is currently we are looking at and increase the count of status_code and number of lines we get.

When you run this code you will see the output in the below format.

{'04/Oct/2019:21:15:20': {'count': 2, '200': 2}}
{'04/Oct/2019:21:15:24': {'count': 1, '200': 1}}
{'04/Oct/2019:21:15:28': {'count': 1, '200': 1}}
{'04/Oct/2019:21:15:31': {'count': 1, '200': 1}}
{'04/Oct/2019:21:15:33': {'count': 1, '200': 1}}
{'04/Oct/2019:21:15:37': {'count': 1, '301': 1}}
{'04/Oct/2019:21:15:39': {'count': 2, '200': 1, '301': 1}}
{'04/Oct/2019:21:15:40': {'count': 1, '200': 1}}
{'04/Oct/2019:21:15:50': {'count': 1, '301': 1}}
{'04/Oct/2019:21:15:51': {'count': 2, '200': 1, '404': 1}}
{'04/Oct/2019:21:15:52': {'count': 1, '200': 1}}
{'04/Oct/2019:21:15:54': {'count': 1, '200': 1}}
{'04/Oct/2019:21:16:00': {'count': 1, '200': 1}}
{'04/Oct/2019:21:16:05': {'count': 1, '200': 1}}
{'04/Oct/2019:21:16:12': {'count': 1, '200': 1}}
{'04/Oct/2019:21:16:13': {'count': 1, '200': 1}}
{'04/Oct/2019:21:16:14': {'count': 1, '200': 1}}
{'04/Oct/2019:21:16:15': {'count': 3, '200': 3}}
{'04/Oct/2019:21:16:22': {'count': 2, '301': 2}}
{'04/Oct/2019:21:16:23': {'count': 1, '200': 1}}
{'04/Oct/2019:21:16:30': {'count': 1, '200': 1}}
{'04/Oct/2019:21:16:41': {'count': 1, '200': 1}}
{'04/Oct/2019:21:16:57': {'count': 1, '200': 1}}

Here we can see that for each second we get the count of log lines and count of different status_codes also.

This was how you can approach log parsing using python. You can also use regex to parse the logs but regex is complex to understand at the start. This is one of the most basic questions asked in DevOps interviews.

If you like the article please share and subscribe.

If you like the article join our Facebook group:https://www.facebook.com/groups/327836414579633/ and Linkedin group:https://www.linkedin.com/groups/10441297/

Read more python articles here. Python Articles


Gaurav Yadav

Gaurav is cloud infrastructure engineer and a full stack web developer and blogger. Sportsperson by heart and loves football. Scale is something he loves to work for and always keen to learn new tech. Experienced with CI/CD, distributed cloud infrastructure, build systems and lot of SRE Stuff.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.