Re: "IP PMTU discovery"

Brian Candler <B.Candler@xxxxxxxxx> · Wed, 29 Nov 2006 22:47:13 +0000

On Wed, Nov 29, 2006 at 12:18:55PM -0500, John Heffner wrote:
> >- can someone explain the rationale for Linux's behaviour?
> 
> The EMSGSIZE error is not fatal -- you can ignore it and try again, and 
> Linux will do the fragmentation for you.

Ah, that explains the behaviour I've noticed with ping -sXXXX where XXXX is
large.

However the send(2) manpage doesn't make this clear. At least, the version I
have installed in CentOS 4.4 says:

       If the message is too long to pass atomically through  the underlying
       protocol,  the  error  EMSGSIZE  is  returned,  and the message is not
       transmitted.
...
       EMSGSIZE
              The socket type requires that message be sent  atomically, and
              the size of the message to be sent made this impossible.

It doesn't hint that a retry may succeed.

Also, presumably send() returns -1 in this case - how would the application
learn what the actual MTU limit encountered was?

But now I'm really confused: I've just retested this, sending a large packet
rather than a small one, and it sends two fragments immediately with no
EMSGSIZE error - see attached code.

$ ./testsock2
result: 2048
result: 2048
result: 2048

Now, my original l2tp testing was with a 2.4 kernel (OpenWrt), and perhaps
that's the source of confusion. The laptop I'm working on right now is 2.6.

> EMSGSIZE is generated when an 
> ICMP Can't Fragment is received, indicating an MTU change.  It's 
> important that this event get propagated back to the application 
> somehow, because some applications really want to do MTU discovery, and 
> this triggers them to change their size, and possibly retransmit some 
> older packets that are now known to have been lost.

Well, the actual behaviour I saw with this particular Linux-based ATA was
that SIP packets >1500 bytes sent to it were being blackholed. At least,
tcpdump on the wire showed them being delivered as two fragments; it's not
clear whether they were being received, but the response was being
blackholed because it also was too large; or they weren't received in the
first place. The product had a firewall option to block fragments, but it
was turned off. Repeated resends were also blackholed.

I sent the setsockopt code to them, plus info on how to replicate the issue.
They say the problem has been replicated and fixed in new firmware to be
released later, but weren't specific as to exactly what they changed.

The work I did on l2tp a while ago was to fix a problem using OpenWrt as an
l2tp client. I'll need to dig around to find the exact details, but I'm
pretty sure I had to make the setsockopt patch for things to work properly.

Regards,

Brian.
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

int main(void)
{
    int s;
    char buf[2048] = "abc";
    int buflen = 2048;
    struct in_addr t;
    struct sockaddr_in to;

    if ((s = socket (PF_INET, SOCK_DGRAM, 0)) < 0) {
        perror("socket");
        exit(1);
    }

    to.sin_family = AF_INET;
    t.s_addr = htonl(0x01020304);
    memcpy(&to.sin_addr, &t.s_addr, 4);
    to.sin_port = htons(9999);

    printf("result: %ld\n", (long) sendto (s, buf, buflen, 0,
            (struct sockaddr *) &to, sizeof (to)));
    printf("result: %ld\n", (long) sendto (s, buf, buflen, 0,
            (struct sockaddr *) &to, sizeof (to)));
    sleep(1);
    printf("result: %ld\n", (long) sendto (s, buf, buflen, 0,
            (struct sockaddr *) &to, sizeof (to)));

    return 0;
}