Learn Steps

Log parsing in python using regular expressions.

Log parsing is a very basic problem for DevOps and SREs and we have a post on this regarding log parsing. You can find the post here

In the earlier post, I have not used regex and have used only string manipulations to parse the logs.

What is a regular expression?

Regular expressions are a sequence of alphabets, numbers, and symbols that defines a search pattern. Regex is present in almost every language and uses a regex processor that uses algorithms like backtracking etc to search for the patterns.

In this, I am going to share the code which you can use to parse the logs with the help of regular expression. The code is written by Marco

#! /bin/env python3

import sys
import re

# 27.59.104.166 - - [04/Oct/2019:21:15:54 +0000] "GET /users/login HTTP/1.1" 200 41716 "-" "okhttp/3.12.1"

LOG_LINE_REGEX = r'^(?P<IP>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*\[(?P<timestamp>.*)\]\s"(?P<verb>[A-Z]+)\s(?P<path>[\w\/]+)\s+(?P<protocol>[\w\/\.]+)"\s(?P<status_code>\d+)\s(?P<response_size>\d+).*'

pattern = re.compile(LOG_LINE_REGEX)

for line in sys.stdin:
    m = pattern.match(line)
    if m:
        print(m.groupdict())

You can use this code to parse the logs, the logs format is written in the commented section of the code. To make changes to parse the different formats of logs you can use https://regexr.com/ to test out your regex. You can also reach out to me through comments anytime.

Note: If you are using regular expressions and are not aware of the cons of backtracking in a regular expression, please read it. Badly written REGEX can take down your systems. The same thing happened with cloud flare you can read the blog here:
https://blog.cloudflare.com/cloudflare-outage/