Re: "IP PMTU discovery"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brian Candler wrote:
On Wed, Nov 29, 2006 at 12:18:55PM -0500, John Heffner wrote:
- can someone explain the rationale for Linux's behaviour?
The EMSGSIZE error is not fatal -- you can ignore it and try again, and Linux will do the fragmentation for you.

Ah, that explains the behaviour I've noticed with ping -sXXXX where XXXX is
large.

However the send(2) manpage doesn't make this clear. At least, the version I
have installed in CentOS 4.4 says:

       If the message is too long to pass atomically through  the underlying
       protocol,  the  error  EMSGSIZE  is  returned,  and the message is not
       transmitted.
...
       EMSGSIZE
              The socket type requires that message be sent  atomically, and
              the size of the message to be sent made this impossible.

It doesn't hint that a retry may succeed.

Also, presumably send() returns -1 in this case - how would the application
learn what the actual MTU limit encountered was?

With a getsockopt() of IP_MTU (man 7 ip). Unfortunately the EMSGSIZE error is also generated on packets longer than 64k, which is fatal. Yuck. It would be nice if this were all a little cleaner, but the API wasn't really originally designed with MTU discovery in mind, so there are no clear standards on the proper behavior.


But now I'm really confused: I've just retested this, sending a large packet
rather than a small one, and it sends two fragments immediately with no
EMSGSIZE error - see attached code.

$ ./testsock2
result: 2048
result: 2048
result: 2048

Now, my original l2tp testing was with a 2.4 kernel (OpenWrt), and perhaps
that's the source of confusion. The laptop I'm working on right now is 2.6.

I don't *think* the behavior has changed, but I haven't looked carefully.

A couple things to keep in mind. One, if you are sending something larger than the interface MTU (or the current route cache MTU), you will not get an ICMP Can't Fragment from anything, and hence no EMSGSIZE. The kernel will automatically fragment these and everything is happy. Also, you will only get EMSGSIZE on a connected socket. It is nonsensical on an unconnected socket with a sendto() call.


EMSGSIZE is generated when an ICMP Can't Fragment is received, indicating an MTU change. It's important that this event get propagated back to the application somehow, because some applications really want to do MTU discovery, and this triggers them to change their size, and possibly retransmit some older packets that are now known to have been lost.

Well, the actual behaviour I saw with this particular Linux-based ATA was
that SIP packets >1500 bytes sent to it were being blackholed. At least,
tcpdump on the wire showed them being delivered as two fragments; it's not
clear whether they were being received, but the response was being
blackholed because it also was too large; or they weren't received in the
first place. The product had a firewall option to block fragments, but it
was turned off. Repeated resends were also blackholed.

I sent the setsockopt code to them, plus info on how to replicate the issue.
They say the problem has been replicated and fixed in new firmware to be
released later, but weren't specific as to exactly what they changed.

The work I did on l2tp a while ago was to fix a problem using OpenWrt as an
l2tp client. I'll need to dig around to find the exact details, but I'm
pretty sure I had to make the setsockopt patch for things to work properly.

Hm, I'm not quite sure what you mean by blackholed. If you observed the fragmented packet sent out on the wire, then this must be a completely separate issue, right?

  -John


-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux