Re: "IP PMTU discovery"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 29, 2006 at 06:08:51PM -0500, John Heffner wrote:
> A couple things to keep in mind.  One, if you are sending something 
> larger than the interface MTU (or the current route cache MTU), you will 
> not get an ICMP Can't Fragment from anything, and hence no EMSGSIZE. 
> The kernel will automatically fragment these and everything is happy. 
> Also, you will only get EMSGSIZE on a connected socket.  It is 
> nonsensical on an unconnected socket with a sendto() call.

That's useful, thank you.

Does it therefore still make sense to set DF=1 on outbound packets from an
unconnected socket?

> >Well, the actual behaviour I saw with this particular Linux-based ATA was
> >that SIP packets >1500 bytes sent to it were being blackholed. At least,
> >tcpdump on the wire showed them being delivered as two fragments; it's not
> >clear whether they were being received, but the response was being
> >blackholed because it also was too large; or they weren't received in the
> >first place. The product had a firewall option to block fragments, but it
> >was turned off. Repeated resends were also blackholed.
...
> 
> Hm, I'm not quite sure what you mean by blackholed.  If you observed the 
> fragmented packet sent out on the wire, then this must be a completely 
> separate issue, right?

Sorry, I wasn't being clear. Here's exactly what was going on:

  ATA1 ---------------> siproxd ------------->
                                               SIP provider
  ATA2 <--------------- siproxd <-------------


* ATA1 places a call. It sends a SIP INVITE to siproxd, which forwards it
  to the SIP provider.

* The SIP provider sends back an INVITE

* siproxd forwards this to ATA2

By this stage the packet is >1500 bytes, since each siproxd and the SIP
provider have added their own Via: header. So what I see going from siproxd
to ATA2 is two fragments. (This is seen using tcpdump on the siproxd server,
which as it happens was running FreeBSD)

So I know those two fragments were sent to ATA2, but nothing was sent back
in response. ATA1 ended up retransmitting the original requests to no avail.

Unfortunately, ATA2 is a black box, so I can't tell whether:

- it didn't receive the fragments, or firewalled them
- it received them, but its response was not going out

Now I think about it some more, the response should have been a 100 Trying
or 180 Ringing, which probably wouldn't have included an SDP body, so it
should have been a small packet.

So, as long as ip_pmtu_disc doesn't affect the handling of *inbound*
fragments, this implies that their firewall rules for handling fragments
were at fault. So IP PMTU DISC is probably a red herring here.

Now, as to l2tpd. Unfortunately, this may also be a red herring... I've dug
through my records, and what I've found is: Linux setting the DF bit on UDP
packets was definitely stopping l2tpd+openswan from interoperating with
Cisco IOS. However, this appears to be weirdness at the Cisco end. So
setting IP_PMTU_DONT to get DF=0 was really a workaround for that specific
problem.

So it looks like the I've probably been wasting everyone's time - sorry.

There is one bit of strangeness I can see with large packets though:

$ ping -s4096 news.bbc.co.uk
PING newswww.bbc.net.uk (212.58.226.29) 4096(4124) bytes of data.
2624 bytes from newslb12.thdo.bbc.co.uk (212.58.226.29): icmp_seq=0 ttl=53 (truncated)
4104 bytes from newslb12.thdo.bbc.co.uk (212.58.226.29): icmp_seq=1 ttl=53 time=36.9 ms
4104 bytes from newslb12.thdo.bbc.co.uk (212.58.226.29): icmp_seq=2 ttl=53 time=33.9 ms
4104 bytes from newslb12.thdo.bbc.co.uk (212.58.226.29): icmp_seq=3 ttl=53 time=34.1 ms

$ ping -c4 -s4096 psg.com
PING psg.com (147.28.0.62) 4096(4124) bytes of data.
2624 bytes from psg.com (147.28.0.62): icmp_seq=0 ttl=52 (truncated)
4104 bytes from psg.com (147.28.0.62): icmp_seq=1 ttl=52 time=187 ms
4104 bytes from psg.com (147.28.0.62): icmp_seq=2 ttl=52 time=185 ms
4104 bytes from psg.com (147.28.0.62): icmp_seq=3 ttl=52 time=185 ms

--- psg.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 182.994/185.495/187.251/1.638 ms, pipe 2

Truncated packets? A 'tcpdump -i eth1 -nv -s0' during the second test showed:

23:47:38.650147 IP (tos 0x0, ttl  64, id 10475, offset 0, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 1480: echo request seq 0
23:47:38.650160 IP (tos 0x0, ttl  64, id 10475, offset 1480, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp
23:47:38.650169 IP (tos 0x0, ttl  64, id 10475, offset 2960, flags [none], proto 1, length: 1164) 10.43.1.14 > 147.28.0.62: icmp
23:47:38.831365 IP (tos 0x0, ttl  52, id 29819, offset 0, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 1472: echo reply seq 0
23:47:38.833107 IP (tos 0x0, ttl  52, id 29819, offset 1472, flags [none], proto 1, length: 1172) 147.28.0.62 > 10.43.1.14: icmp
23:47:38.834613 IP (tos 0x0, ttl  64, id 31747, offset 0, flags [DF], proto 17, length: 70) 10.43.1.14.32772 > 212.130.104.10.domain:  2271+ PTR? 62.0.28.147.in-addr.arpa. (42)
23:47:38.876362 IP (tos 0x0, ttl 114, id 34944, offset 0, flags [none], proto 17, length: 91) 212.130.104.10.domain > 10.43.1.14.32772:  2271 1/0/0 62.0.28.147.in-addr.arpa. PTR psg.com. (63)
23:47:39.651144 IP (tos 0x0, ttl  64, id 10476, offset 0, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 1480: echo request seq 1
23:47:39.651159 IP (tos 0x0, ttl  64, id 10476, offset 1480, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp
23:47:39.651166 IP (tos 0x0, ttl  64, id 10476, offset 2960, flags [none], proto 1, length: 1164) 10.43.1.14 > 147.28.0.62: icmp
23:47:39.835195 IP (tos 0x0, ttl  52, id 30230, offset 0, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 1472: echo reply seq 1
23:47:39.836914 IP (tos 0x0, ttl  52, id 30230, offset 1472, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp
23:47:39.838367 IP (tos 0x0, ttl  52, id 30230, offset 2944, flags [none], proto 1, length: 1180) 147.28.0.62 > 10.43.1.14: icmp
23:47:40.652977 IP (tos 0x0, ttl  64, id 10477, offset 0, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 1480: echo request seq 2
23:47:40.652994 IP (tos 0x0, ttl  64, id 10477, offset 1480, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp
23:47:40.653002 IP (tos 0x0, ttl  64, id 10477, offset 2960, flags [none], proto 1, length: 1164) 10.43.1.14 > 147.28.0.62: icmp
23:47:40.835151 IP (tos 0x0, ttl  52, id 31235, offset 0, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 1472: echo reply seq 2
23:47:40.836964 IP (tos 0x0, ttl  52, id 31235, offset 1472, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp
23:47:40.838881 IP (tos 0x0, ttl  52, id 31235, offset 2944, flags [none], proto 1, length: 1180) 147.28.0.62 > 10.43.1.14: icmp
23:47:41.653813 IP (tos 0x0, ttl  64, id 10478, offset 0, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp 1480: echo request seq 3
23:47:41.653828 IP (tos 0x0, ttl  64, id 10478, offset 1480, flags [+], proto 1, length: 1500) 10.43.1.14 > 147.28.0.62: icmp
23:47:41.653836 IP (tos 0x0, ttl  64, id 10478, offset 2960, flags [none], proto 1, length: 1164) 10.43.1.14 > 147.28.0.62: icmp
23:47:41.835751 IP (tos 0x0, ttl  52, id 32322, offset 0, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp 1472: echo reply seq 3
23:47:41.836732 IP (tos 0x0, ttl  52, id 32322, offset 1472, flags [+], proto 1, length: 1492) 147.28.0.62 > 10.43.1.14: icmp
23:47:41.839586 IP (tos 0x0, ttl  52, id 32322, offset 2944, flags [none], proto 1, length: 1180) 147.28.0.62 > 10.43.1.14: icmp

This is weird. Maybe it's just those remote hosts which are broken, or it
might be the NAT firewall I'm behind right now (in a hotel)

I did find a case of someone else reporting a similar problem:
http://lists.openswan.org/pipermail/users/2005-March/004037.html
but then the people who responded said that it worked OK for them.

Regards,

Brian.
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux