What is blackbox monitoring? Automating with bash script

Monitoring is very important when you want your systems to work properly. Monitoring helps you identify the issues while they are happening or before them. So it is very important to implement your monitoring properly. Let’s see what is black box monitoring.

Now when you see any issue how you should approach your debugging. First of all, we should start by looking at BlackBox monitoring. Now, what is black box monitoring?

Blackbox monitoring includes values and metrics that you can check without looking at application-specific metrics. These metrics involve CPU, Memory or IOPS. Linux provides basic utilities to keep a watch at these metrics. Below are a few commands to check black box monitoring.

CPU: top or htop

htop command will tell you about CPU, Load average, memory consumption. Further, you can check thread per process using the below.

Memory: free -m

This will tell you free memory, buffered memory, and used memory

Disk: df -Th

This will tell you about the disk space left.

IOPS: iostats

IO per second, time wait for reading and write, etc.

w Command

Press w and enter, this will give overview of the system like uptime, users logged in and load average.

Compile in one place

Lets compile all of them together in a bash script and use it. Look at the script below.

get_vmstat(){
        print_message "VMSTATS"
        vmstat | awk ' END {print "No_process_waiting_for_CPU: " $1, "\nMemory_Idle: " $4, "\n\nSwapped_In_memory: " $9, "\nSwapped_Out_Memory: " $10, "\n\nCPU_time_runnning_user_code: " $15, "\nCPU_time_runnning_system_code: " $16, "\n\nInterupts_per_sec: " $13, "\nContext_switches_per_sec: " $14} '| column -t
}

get_simple_info(){
        print_message "Simple INFO"
        echo "No of cores":  `nproc`
        echo "Uptime"
        w
}

get_max_thread_running_proc(){
        print_message "Max thread Running process[3rd column is number of threads]"
        ps axo pid,ppid,nlwp,cmd | sort -nr| head -n 5
}

get_highest_memory_proc(){
        print_message "PROCESS WITH HIGEST MEMEORY"  30 70
        ps aux| sort -rk 4,4 | head -n 5| awk '{print $2, $4, $11}'| column -t
}

get_highest_cpu_proc(){
        print_message "PROCESS WITH HIGHEST CPU"
        ps aux| sort -rk 3,3 | head -n 5| awk '{print $2, $3, $11}'| column -t
}

get_disk_states(){
        print_message "DISK USAGE" 1 10
        df -Th | sort -rk 6| head -n 3| column -t
}

get_help(){
        printf "\n i: IOSTAT"
        printf "\n d: DISK USAGE"
        printf "\n s: Simple INFO"
        printf "\n c: PROCESS WITH HIGHEST CPU"
        printf "\n m: PROCESS WITH HIGEST MEMEORY"
        printf "\n t: Max thread Running process"
        printf "\n v: VMSTAT"

}


case $1 in
        i)
                get_iostat
        ;;
        d)
        get_disk_states
    ;;
    s)
                get_simple_info
        ;;
        c)
                get_highest_cpu_proc
        ;;
        m)
                get_highest_memory_proc
        ;;
        t)
                get_max_thread_running_proc
        ;;
        v)
                get_vmstat
        ;;
        h)
                get_help
        ;;
        help)
                get_help
        ;;
        "")
                get_iostat
                get_vmstat
                get_disk_states
                get_simple_info
                get_highest_memory_proc
                get_highest_cpu_proc
                get_max_thread_running_proc
        ;;
esac



printf "\n\n"

How to use this?

Copy this script into /usr/local/bin/ov and give it below permissions

chmod a+x /usr/local/bin/ov

Now when you type ov command you can see the black box monitoring overview of the whole system.

If you like the article please share and subscribe and join our Facebook group:https://www.facebook.com/groups/327836414579633/ and Linkedin group:https://www.linkedin.com/groups/10441297/


Gaurav Yadav

Gaurav is cloud infrastructure engineer and a full stack web developer and blogger. Sportsperson by heart and loves football. Scale is something he loves to work for and always keen to learn new tech. Experienced with CI/CD, distributed cloud infrastructure, build systems and lot of SRE Stuff.

1 COMMENT
  • How to debug issues and performance in production: The USE method - Learn Steps
    Reply

    […] bad thing was the metrics were not there to check the disk usage. That is why blackbox monitoring is so important and this is where the USE method may have helped to save the debugging time. It […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.