Points to mind while debugging production issues.

It’s always hectic to debug production issues and it is always wise to have a predefined step to debug any issues. But there can be scenarios where those steps may not work. In those scenarios also you should always have a flow in mind what to check. In this article, we are going to see some points to mind while debugging

People generally develop hypotheses as to what must be wrong. Don’t start checking these hypotheses.

When everyone starts checking their hypotheses, they make a lot of changes and those can become hectic to keep track of. There should be an incident manager and he should take a proper decisions as to what to be check and has to note down properly.

First of all check the resources of the system like CPU, Memory, and disk and then how much they are used and if there is any pressure on them. You must read the below article for this and how it can help.

How to debug issues and performance in production: The USE method

You will find people doing a restart as a solution to many problems. Mind it restart is not a solution. It is just a way to show you are lazy and don’t want to debug rather fix the system and live with the probability that it will happen again in the future.

Restarting is never a solution and you are just pushing the D day a bit more ahead in the future. If you don’t fix the problem it can harm you big time in the future.

Try to remove the possibilities that you think can be the reason one by one. Mind it one change at a time. If you start making multiple changes you may get stuck as there will be confusion on what change has fixed it.

Making one change at a time and keeping a record will help you to track down the problem and will be able to pinpoint the issue.

Understanding the flow of the system which you are managing will help you to reach the problem faster.

Is always better to have more knowledge of the system. The more you are informed about the system, the better decisions you can take to mitigate any issue.

Provision your system in such a way that you can see metrics and logs very easily and search in them.

Having visibility in what’s happening is best. If you are able to figure out from your metrics where the issue is of if the issue can happen, means you can predict when the issue can happen. You have done 50% of your DevOps right.

If you like the article please share and subscribe.

Points to mind while debugging production issues.

People generally develop hypotheses as to what must be wrong. Don’t start checking these hypotheses.

First of all check the resources of the system like CPU, Memory, and disk and then how much they are used and if there is any pressure on them. You must read the below article for this and how it can help.

You will find people doing a restart as a solution to many problems. Mind it restart is not a solution. It is just a way to show you are lazy and don’t want to debug rather fix the system and live with the probability that it will happen again in the future.

Try to remove the possibilities that you think can be the reason one by one. Mind it one change at a time. If you start making multiple changes you may get stuck as there will be confusion on what change has fixed it.

Understanding the flow of the system which you are managing will help you to reach the problem faster.

Provision your system in such a way that you can see metrics and logs very easily and search in them.

Related

Leave a Reply Cancel reply

People generally develop hypotheses as to what must be wrong. Don’t start checking these hypotheses.

First of all check the resources of the system like CPU, Memory, and disk and then how much they are used and if there is any pressure on them. You must read the below article for this and how it can help.

You will find people doing a restart as a solution to many problems. Mind it restart is not a solution. It is just a way to show you are lazy and don’t want to debug rather fix the system and live with the probability that it will happen again in the future.

Try to remove the possibilities that you think can be the reason one by one. Mind it one change at a time. If you start making multiple changes you may get stuck as there will be confusion on what change has fixed it.

Understanding the flow of the system which you are managing will help you to reach the problem faster.

Provision your system in such a way that you can see metrics and logs very easily and search in them.

Shout out to others

Related

Leave a Reply Cancel reply