Network issues are the most common tasks that you have to debug if you involved in the setup of infrastructure and day-to-day operations. In this article, we will see how we can approach this problem.
Let’s say we have few boxes that need to connect to a third-party API that is outside of the network, all our traffic is going out through a NAT box. We will try to see how we can debug this. Below is the diagram of how the connection flows.
We should generally try to debug in very easy three steps. Check if DNS is working, check if the service is up, check if it is taking the correct route if not where it is breaking.
First of all, we need to check if the external service DNS is being resolved or not. To check this you can run the below command.
Simply running telnet will also give you if it’s not resolvable. We are trying to establish a way to check so we are not doing it atm.
If this is resolved then the issue is not here, else you have to check your
resolv.conf files and debug at
Next, we will check if we can reach the
NAT box. And then we will see if the external services are reachable. To do so we can use a traceroute to see the routes packet will take.
The output will be something like below.
2nd line of the output should have your NAT IP.
traceroute to google.com (126.96.36.199), 30 hops max, 60 byte packets 1 _gateway (192.168.1.1) 5.145 ms 5.269 ms 5.334 ms 2 abts-mh-dynamic-001.33.169.122.airtelbroadband.in (188.8.131.52) 8.484 ms 8.751 ms 8.721 ms 3 184.108.40.206 (220.127.116.11) 9.018 ms aes-static-249.51.22.125.airtel.in (18.104.22.168) 8.991 ms 9.277 ms 4 22.214.171.124 (126.96.36.199) 22.388 ms 188.8.131.52 (184.108.40.206) 18.453 ms 220.127.116.11 (18.104.22.168) 20.789 ms 5 22.214.171.124 (126.96.36.199) 20.392 ms 18.865 ms 18.834 ms 6 10.23.221.190 (10.23.221.190) 23.658 ms 10.23.221.222 (10.23.221.222) 14.676 ms 10.23.206.126 (10.23.206.126) 24.185 ms 7 188.8.131.52 (184.108.40.206) 14.236 ms 220.127.116.11 (18.104.22.168) 14.244 ms 22.214.171.124 (126.96.36.199) 14.171 ms 8 188.8.131.52 (184.108.40.206) 20.370 ms 220.127.116.11 (18.104.22.168) 18.547 ms 22.214.171.124 (126.96.36.199) 14.245 ms 9 188.8.131.52 (184.108.40.206) 21.057 ms 220.127.116.11 (18.104.22.168) 33.528 ms 22.214.171.124 (126.96.36.199) 12.968 ms 10 188.8.131.52 (184.108.40.206) 34.578 ms 220.127.116.11 (18.104.22.168) 30.827 ms 22.214.171.124 (126.96.36.199) 35.468 ms 11 188.8.131.52 (184.108.40.206) 39.609 ms 220.127.116.11 (18.104.22.168) 37.134 ms 22.214.171.124 (126.96.36.199) 33.761 ms 12 bom07s20-in-f14.1e100.net (188.8.131.52) 52.450 ms 184.108.40.206 (220.127.116.11) 52.390 ms 18.104.22.168 (22.214.171.124) 48.849 ms
This will show the route it is taking to reach the external service, this should go through your NAT box. If it is not going through your NAT box it is again an issue. So you have to check your routes of the subnet in which your internal apps are deployed. The external service may be rejecting the request because it is not coming from a NAT box. To fix it you have to add a route in your app’s subnet to route any traffic that has to go out should go through the NAT box.
Anything to 0.0.0.0, the next hop should be NAT box.
So till now, we have established that the traffic is coming through the NAT box if we have fixed it. Next, if you are not able to reach external service on a port. Try to telnet and see if you are able to connect it. If you are not able to connect it, you will get the below error.
telnet external_dns port Trying 127.0.0.1... telnet: Unable to connect to remote host: Connection refused
This means that the service is not active on this port and you have to talk to the vendor about this.
This was very basic of how you can approach such problems. There can be many other issues on NAT box and external service that may need more in-depth debugging. The issue can be
IP forwarding not active on the NAT box. Few headers are getting dropped due to some issue. There can be a lot of issues that can happen.
You have to remember a very basic principle that you have to check the path step by step and debug it. Don't try to skip the steps. It is as same as a clogged pipe and you need to clean it from end to end to allow water to flow.
If you like the article please share and subscribe.
Except if you are letting trace route go through your firewall you have already failed.
Very interested topic