On Wed, Nov 29, 2006 at 06:08:51PM -0500, John Heffner wrote: > A couple things to keep in mind. One, if you are sending something > larger than the interface MTU (or the current route cache MTU), you will > not get an ICMP Can't Fragment from anything, and hence no EMSGSIZE. > The kernel will automatically fragment these and everything is happy. > Also, you will only get EMSGSIZE on a connected socket. It is > nonsensical on an unconnected socket with a sendto() call. That's useful, thank you. Does it therefore still make sense to set DF=1 on outbound packets from an unconnected socket? > >Well, the actual behaviour I saw with this particular Linux-based ATA was > >that SIP packets >1500 bytes sent to it were being blackholed. At least, > >tcpdump on the wire showed them being delivered as two fragments; it's not > >clear whether they were being received, but the response was being > >blackholed because it also was too large; or they weren't received in the > >first place. The product had a firewall option to block fragments, but it > >was turned off. Repeated resends were also blackholed. ... > > Hm, I'm not quite sure what you mean by blackholed. If you observed the > fragmented packet sent out on the wire, then this must be a completely > separate issue, right? Sorry, I wasn't being clear. Here's exactly what was going on: ATA1 ---------------> siproxd -------------> SIP provider ATA2 <--------------- siproxd <------------- * ATA1 places a call. It sends a SIP INVITE to siproxd, which forwards it to the SIP provider. * The SIP provider sends back an INVITE * siproxd forwards this to ATA2 By this stage the packet is >1500 bytes, since each siproxd and the SIP provider have added their own Via: header. So what I see going from siproxd to ATA2 is two fragments. (This is seen using tcpdump on the siproxd server, which as it happens was running FreeBSD) So I know those two fragments were sent to ATA2, but nothing was sent back in response. ATA1 ended up retransmitting the original requests to no avail. Unfortunately, ATA2 is a black box, so I can't tell whether: - it didn't receive the fragments, or firewalled them - it received them, but its response was not going out Now I think about it some more, the response should have been a 100 Trying or 180 Ringing, which probably wouldn't have included an SDP body, so it should have been a small packet. So, as long as ip_pmtu_disc doesn't affect the handling of *inbound* fragments, this implies that their firewall rules for handling fragments were at fault. So IP PMTU DISC is probably a red herring here. Now, as to l2tpd. Unfortunately, this may also be a red herring... I've dug through my records, and what I've found is: Linux setting the DF bit on UDP packets was definitely stopping l2tpd+openswan from interoperating with Cisco IOS. However, this appears to be weirdness at the Cisco end. So setting IP_PMTU_DONT to get DF=0 was really a workaround for that specific problem. So it looks like the I've probably been wasting everyone's time - sorry. There is one bit of strangeness I can see with large packets though: $ ping -s4096 news.bbc.co.uk PING newswww.bbc.net.uk (212.58.226.29) 4096(4124) bytes of data. 2624 bytes from newslb12.thdo.bbc.co.uk (212.58.226.29): icmp_seq=0 ttl=53 (truncated) 4104 bytes from newslb12.thdo.bbc.co.uk (212.58.226.29): icmp_seq=1 ttl=53 time=36.9 ms 4104 bytes from newslb12.thdo.bbc.co.uk (212.58.226.29): icmp_seq=2 ttl=53 time=33.9 ms 4104 bytes from newslb12.thdo.bbc.co.uk (212.58.226.29): icmp_seq=3 ttl=53 time=34.1 ms $ ping -c4 -s4096 psg.com PING psg.com (147.28.0.62) 4096(4124) bytes of data. 2624 bytes from psg.com (147.28.0.62): icmp_seq=0 ttl=52 (truncated) 4104 bytes from psg.com (147.28.0.62): icmp_seq=1 ttl=52 time=187 ms 4104 bytes from psg.com (147.28.0.62): icmp_seq=2 ttl=52 time=185 ms 4104 bytes from psg.com (147.28.0.62): icmp_seq=3 ttl=52 time=185 ms --- psg.com ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3003ms rtt min/avg/max/mdev = 182.994/185.495/187.251/1.638 ms, pipe 2 Truncated packets? A 'tcpdump -i eth1 -nv -s0' during the second test showed: 23:47:38.650147 IP (tos 0x0, ttl 64, id 10475, offset 0, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 1480: echo request seq 0 23:47:38.650160 IP (tos 0x0, ttl 64, id 10475, offset 1480, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 23:47:38.650169 IP (tos 0x0, ttl 64, id 10475, offset 2960, flags [none], proto 1, length: 1164) 10.43.1.14 > 147.28.0.62: icmp 23:47:38.831365 IP (tos 0x0, ttl 52, id 29819, offset 0, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 1472: echo reply seq 0 23:47:38.833107 IP (tos 0x0, ttl 52, id 29819, offset 1472, flags [none], proto 1, length: 1172) 147.28.0.62 > 10.43.1.14: icmp 23:47:38.834613 IP (tos 0x0, ttl 64, id 31747, offset 0, flags [DF], proto 17, length: 70) 10.43.1.14.32772 > 212.130.104.10.domain: 2271+ PTR? 62.0.28.147.in-addr.arpa. (42) 23:47:38.876362 IP (tos 0x0, ttl 114, id 34944, offset 0, flags [none], proto 17, length: 91) 212.130.104.10.domain > 10.43.1.14.32772: 2271 1/0/0 62.0.28.147.in-addr.arpa. PTR psg.com. (63) 23:47:39.651144 IP (tos 0x0, ttl 64, id 10476, offset 0, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 1480: echo request seq 1 23:47:39.651159 IP (tos 0x0, ttl 64, id 10476, offset 1480, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 23:47:39.651166 IP (tos 0x0, ttl 64, id 10476, offset 2960, flags [none], proto 1, length: 1164) 10.43.1.14 > 147.28.0.62: icmp 23:47:39.835195 IP (tos 0x0, ttl 52, id 30230, offset 0, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 1472: echo reply seq 1 23:47:39.836914 IP (tos 0x0, ttl 52, id 30230, offset 1472, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 23:47:39.838367 IP (tos 0x0, ttl 52, id 30230, offset 2944, flags [none], proto 1, length: 1180) 147.28.0.62 > 10.43.1.14: icmp 23:47:40.652977 IP (tos 0x0, ttl 64, id 10477, offset 0, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 1480: echo request seq 2 23:47:40.652994 IP (tos 0x0, ttl 64, id 10477, offset 1480, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 23:47:40.653002 IP (tos 0x0, ttl 64, id 10477, offset 2960, flags [none], proto 1, length: 1164) 10.43.1.14 > 147.28.0.62: icmp 23:47:40.835151 IP (tos 0x0, ttl 52, id 31235, offset 0, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 1472: echo reply seq 2 23:47:40.836964 IP (tos 0x0, ttl 52, id 31235, offset 1472, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 23:47:40.838881 IP (tos 0x0, ttl 52, id 31235, offset 2944, flags [none], proto 1, length: 1180) 147.28.0.62 > 10.43.1.14: icmp 23:47:41.653813 IP (tos 0x0, ttl 64, id 10478, offset 0, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 1480: echo request seq 3 23:47:41.653828 IP (tos 0x0, ttl 64, id 10478, offset 1480, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 23:47:41.653836 IP (tos 0x0, ttl 64, id 10478, offset 2960, flags [none], proto 1, length: 1164) 10.43.1.14 > 147.28.0.62: icmp 23:47:41.835751 IP (tos 0x0, ttl 52, id 32322, offset 0, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 1472: echo reply seq 3 23:47:41.836732 IP (tos 0x0, ttl 52, id 32322, offset 1472, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 23:47:41.839586 IP (tos 0x0, ttl 52, id 32322, offset 2944, flags [none], proto 1, length: 1180) 147.28.0.62 > 10.43.1.14: icmp This is weird. Maybe it's just those remote hosts which are broken, or it might be the NAT firewall I'm behind right now (in a hotel) I did find a case of someone else reporting a similar problem: http://lists.openswan.org/pipermail/users/2005-March/004037.html but then the people who responded said that it worked OK for them. Regards, Brian. - To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html