Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > The proposed fix is non-trivial, and changes not just the new path > that observes the issue (MSG_SPLICE_PAGES), but also the other more > common paths that exercise __ip6_append_data. I realise that. I broke ping/ping6 briefly, but I corrected that (I subtracted the ICMP header len from length after copying it out, but forgot that it needed adding back on for the return value of sendmsg()). But I don't think there are that many callers - however, you might be right that this is too big for a fix. > There is significant risk to introduce an unintended side effect > requiring a follow-up fix. Because this function is notoriously > complex, multiplexing a lot of behavior: with and without transport > headers, edge cases like fragmentation, MSG_MORE, absence of > scatter-gather, .... The problem is that the bug isn't in __ip{,6}_append_data(), I think, it's actually higher up in ip{,6}_append_data(). I think I see *why* length has transhdrlen handed into it: because ping and raw sockets come with that pre-added-in by userspace. I would actually like to eliminate the length argument entirely and use the length in the iterator - but that doesn't work in all cases as sometimes there isn't a msghdr struct. (And, besides, that's too big a change for a fix). I think the simplest fix, then, is just to make ip{,6}_append_data() subtract transhdrlen from length before clearing transhdrlen when there's already a packet in the queue from MSG_MORE/cork that will be appended to. > Does the issue discovered only affect MSG_SPLICE_PAGES or can it > affect other paths too? If the first, it possible to create a more > targeted fix that can trivially be seen to not affect code prior to > introduction of splice pages? It may also affect MSG_ZEROCOPY in interesting ways. msg_zerocopy_realloc() looks suspicious as it does things with 'size' bytes from the buffer that doesn't have 'size' bytes of data in it (because size (aka length) includes transhdrlen). I would guess that we don't notice issues with ping sockets because people don't often use MSG_MORE/corking with them. Raw sockets shouldn't exhibit this bug as they set transhdrlen to 0 up front, but I can't help but wonder what the consequences are as some bits of __ip*_append_data() change behaviour if they see transhdrlen==0 :-/ David