David Howells wrote: > Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > + } else if ((flags & MSG_SPLICE_PAGES) && length) { > > > + if (inet->hdrincl) > > > + return -EPERM; > > > + if (rt->dst.dev->features & NETIF_F_SG) > > > + /* We need an empty buffer to attach stuff to */ > > > + initial_length = transhdrlen; > > > > I still don't entirely understand what initial_length means. > > > > More importantly, transhdrlen can be zero. If not called for UDP > > but for RAW. Or if this is a subsequent call to a packet that is > > being held with MSG_MORE. > > > > This works fine for existing use-cases, which go to alloc_new_skb. > > Not sure how this case would be different. But the comment alludes > > that it does. > > The problem is that in the non-MSG_ZEROCOPY case, __ip_append_data() assumes > that it's going to copy the data it is given and will allocate sufficient > space in the skb in advance to hold it - but I don't want to do that because I > want to splice in the pages holding the data instead. However, I do need to > allocate space to hold the transport header. > > Maybe I should change 'initial_length' to 'initial_alloc'? It represents the > amount I think we should allocate. Or maybe I should have a separate > allocation clause for MSG_SPLICE_PAGES? The code already has to avoid allocation in the MSG_ZEROCOPY case. I added alloc_len and paged_len for that purpose. Only the transhdrlen will be copied with getfrag due to copy = datalen - transhdrlen - fraggap - pagedlen On next iteration in the loop, when remaining data fits in the skb, there are three cases. The first is skipped due to !NETIF_F_SG. The other two are either copy to page frags or zerocopy page frags. I think your code should be able to fit in. Maybe easier if it could reuse the existing alloc_new_skb code to copy the transport header, as MSG_ZEROCOPY does, rather than adding a new __ip_splice_alloc branch that short-circuits that. Then __ip_splice_pages also does not need code to copy the initial header. But this is trickier. It's fine to leave as is. Since your code currently does call continue before executing the rest of that branch, no need to modify any code there? Notably replacing length with initial_length, which itself is initialized to length in all cases expect for MSG_SPLICE_PAGES. Just hardcode transhdrlen as the copy argument to __ip_splice_pages. > I also wonder if __ip_append_data() really needs two places that call > getfrag(). > > David >