Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/1232/cgroup: Failed to open file: No such file or directory. Container b8d8339ac8e6cbfb17111456dcd919b51231232131231233931cee8cfc3f87 failed to exit within 40 seconds of signal 15 - using the force
Recently I came across this error which lead to a lot of debugging and at the end I found out that it can be solved only by restarting the machine.
Docker version: 17.09-ce
In one of the deployments, I started getting this error and was unable to deploy the container.
I started trying to find out how can this be resolved. I tried the below options
Stopping/Killing/Removing the docker container. None of these things work and are stuck returning nothing.
We also tried accessing the container and it did not work. Killing the pid also didn’t work and we were stuck.
Now the next thing that can be done is restarting the docker daemon. It was bad because other containers will also die. But we don’t have any other way to resolve this.
So we stopped the docker daemon and tried to restart. But to our surprise the docker daemon did not start. Now it was a serious issue, we started digging internet and saw in few places that it can not be fixed and you have to leave the machine.
After digging for sometime. We took decision to reboot the slave. It fixed the docker daemon.
What could have happened here?
Containers works combing the concepts of cgroups. When we delete containers, the cgroups are also deleted. What could have happened here is that when we triggered the kill container command, it may have deleted the cgroups but may not be able to send the acknowledgement to docker. Since docker think that it is not deleted, every time you issue command to delete or remove results in no response. Since it did not find the cgroup file to delete.
This is just a suggestion that can happen. I did not go deep in the issue. If you have any other knowledge please share in the comments. I will add it in the article.