On Thu, Jul 31, 2008 at 9:39 AM, Pekka Savola <pekkas@xxxxxxxxxx> wrote: > I got a little bit interested in this, so below are a few pointers > to how to continue with investigation. Apologies for my delay in replying. I'm still desperate to find out what's going on, however, so here's the next installment...! > On Wed, 30 Jul 2008, j t wrote: >> >> Here are the results of the 2 boundary-case pings: >> >> $ ping -s 1472 -M do ocp.com.com >> PING c18-ad-xw-lb.cnet.com (216.239.122.193) 1472(1500) bytes of data. >> 64 bytes from c18-ad-xw-lb.cnet.com (216.239.122.193): icmp_seq=1 >> ttl=241 (truncated) > > Truncated means that you got less data than you expected. Here you're > requesting 1472B but you're only getting 56+8B. > > You should try to figure out where this is disappearing. Can you do a ping > -s 1472 without "truncated" with other sites internet? This will give you > clues whether the issue is at your end or the destination network end. Hmmm. Even this bit is wierd: In 1 group, there are hosts such as ocp.com.com (my original problem host), www.google.com and www.cnet.com Whenever I ping these hosts and specify a packetsize greater than 56 bytes, the results get truncated: $ ping -s 56 -n ocp.com.com PING c18-ad-xw-lb.cnet.com (216.239.122.193) 56(84) bytes of data. 64 bytes from 216.239.122.193: icmp_seq=1 ttl=241 time=132 ms $ ping -s 56 -n www.google.com PING www.l.google.com (66.249.91.99) 56(84) bytes of data. 64 bytes from 66.249.91.99: icmp_seq=1 ttl=247 time=26.0 ms $ ping -s 56 -n www.cnet.com PING c18-rb-tron-ssa-xw-split-lb.cnet.com (216.239.122.142) 56(84) bytes of data. 64 bytes from 216.239.122.142: icmp_seq=1 ttl=241 time=124 ms but $ ping -s 57 -n ocp.com.com PING c18-ad-xw-lb.cnet.com (216.239.122.193) 57(85) bytes of data. 64 bytes from 216.239.122.193: icmp_seq=1 ttl=241 (truncated) $ ping -s 57 -n www.google.com PING www.l.google.com (66.249.91.103) 57(85) bytes of data. 64 bytes from 66.249.91.103: icmp_seq=1 ttl=247 (truncated) $ ping -s 57 -n www.cnet.com PING c18-rb-tron-ssa-xw-split-lb.cnet.com (216.239.122.142) 57(85) bytes of data. 64 bytes from 216.239.122.142: icmp_seq=1 ttl=241 (truncated) For each of these hosts (ocp.com.com, www.google.com & www.cnet.com), I can ping them with sizes up to 1472 (and I get truncated results) but if I increase the packetsize to 1473, I receive no replies at all: $ ping -s 1472 -n www.google.com PING www.l.google.com (66.249.91.147) 1472(1500) bytes of data. 64 bytes from 66.249.91.147: icmp_seq=1 ttl=247 (truncated) 64 bytes from 66.249.91.147: icmp_seq=2 ttl=247 (truncated) --- www.l.google.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1002ms rtt min/avg/max/mdev = 39.231/39.264/39.298/0.200 ms $ ping -s 1473 -n www.google.com PING www.l.google.com (66.249.91.104) 1473(1501) bytes of data. --- www.l.google.com ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2008ms In the other group are hosts such as slashdot.org, www.debian.org, www.redhat.com. I can ping these guys with packetsizes of 2000 bytes with no truncation: $ ping -s 2000 -n slashdot.org PING slashdot.org (216.34.181.45) 2000(2028) bytes of data. 2008 bytes from 216.34.181.45: icmp_seq=1 ttl=242 time=135 ms $ ping -s 2000 -n www.debian.org PING www.debian.org (194.109.137.218) 2000(2028) bytes of data. 2008 bytes from 194.109.137.218: icmp_seq=1 ttl=57 time=51.4 ms $ ping -s 2000 -n www.redhat.com PING e86.b.akamaiedge.net (88.221.176.112) 2000(2028) bytes of data. 2008 bytes from 88.221.176.112: icmp_seq=1 ttl=60 time=36.5 ms I'll run lft against the 1st group to try to find a list of routers between me and them and reply back with more info. Surely once I've tested and mapped out enough hosts, I should be able to figure out where the problem is... :-( > My suspicion is that the load-balancer at ocp.com.com has interesting ICMP > implementation that even if you ping it with big packets, it replies with > small packets, and you can't figure out MTU issues like this. > >> $ ping -s 1473 -M do ocp.com.com >> PING c18-ad-xw-lb.cnet.com (216.239.122.193) 1473(1501) bytes of data. >>> >>> From t60jt (192.168.0.3) icmp_seq=1 Frag needed and DF set (mtu = 1500) >>> From t60jt (192.168.0.3) icmp_seq=1 Frag needed and DF set (mtu = 1500) >>> From t60jt (192.168.0.3) icmp_seq=1 Frag needed and DF set (mtu = 1500) >>> From t60jt (192.168.0.3) icmp_seq=1 Frag needed and DF set (mtu = 1500) >> >> If I am correct, success with "-s 1472" means that an mtu of 1500 >> should work (i.e. lowering the mtu down to 1499 should not be >> necessary). Consequently, I don't want to drop the mtu down to 1499 if >> that will simply mask/cover a bigger problem. > > Note that you're getting this ICMP message apparently from a local network > and it doesn't prove much in and of itself. Another quick question: do you say that I'm "getting this ICMP message apparently from a local network" because it says "From t60jt (192.168.0.3)" in the lines above? If so, what's the relevance - I ask, since t60jt is _my_ machine (the box I'm sitting in front of)! > > As for your questions: >> >> a) Dropping the mtu down to 1499 doesn't tell me why wget works under >> windows (without the need to drop the mtu). >> >> b) Dropping the mtu down to 1499 doesn't tell me why wget (under >> linux) works if I force my router to grab a public-facing ip address >> in the range 93.96.x.x. >> >> c) Dropping the mtu down to 1499 doesn't agree with/explain the >> results of the ping testing, which follows... > > If you want to figure this out, I think you'll need to run tcpdump on the > host (both windows and linux) and compare the TCP streams as they seem to > you. Specifically I'd look for the MSS negotiated size, whether one uses > fragments and one doesn't, and used TCP options. > (Even better would be doing a few tests to another host in internet, which > is also running tcpdump. This would show if your ISP is modifying any > packets.) I'll try running tcpdump on my (openwrt) internet gateway / router / ipmasq box this week and report back with the results. Thanks again for all comments / replies / input.... Jaime :-) -- To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html