David Howells wrote: > Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > The code already has to avoid allocation in the MSG_ZEROCOPY case. I > > added alloc_len and paged_len for that purpose. > > > > Only the transhdrlen will be copied with getfrag due to > > > > copy = datalen - transhdrlen - fraggap - pagedlen > > > > On next iteration in the loop, when remaining data fits in the skb, > > there are three cases. The first is skipped due to !NETIF_F_SG. The > > other two are either copy to page frags or zerocopy page frags. > > > > I think your code should be able to fit in. Maybe easier if it could > > reuse the existing alloc_new_skb code to copy the transport header, as > > MSG_ZEROCOPY does, rather than adding a new __ip_splice_alloc branch > > that short-circuits that. Then __ip_splice_pages also does not need > > code to copy the initial header. But this is trickier. It's fine to > > leave as is. > > > > Since your code currently does call continue before executing the rest > > of that branch, no need to modify any code there? Notably replacing > > length with initial_length, which itself is initialized to length in > > all cases expect for MSG_SPLICE_PAGES. > > Okay. How about the attached? This seems to work. Just setting "paged" to > true seems to do the right thing in __ip_append_data() when allocating / > setting up the skbuff, and then __ip_splice_pages() is called to add the > pages. If this works, much preferred. Looks great to me. As said, then __ip_splice_pages() probably no longer needs the preamble to copy initial header bytes. > David > --- > commit 9ac72c83407c8aef4be0c84513ec27bac9cfbcaa > Author: David Howells <dhowells@xxxxxxxxxx> > Date: Thu Mar 9 14:27:29 2023 +0000 > > ip, udp: Support MSG_SPLICE_PAGES > > Make IP/UDP sendmsg() support MSG_SPLICE_PAGES. This causes pages to be > spliced from the source iterator. > > This allows ->sendpage() to be replaced by something that can handle > multiple multipage folios in a single transaction. > > Signed-off-by: David Howells <dhowells@xxxxxxxxxx> > cc: Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> > cc: "David S. Miller" <davem@xxxxxxxxxxxxx> > cc: Eric Dumazet <edumazet@xxxxxxxxxx> > cc: Jakub Kicinski <kuba@xxxxxxxxxx> > cc: Paolo Abeni <pabeni@xxxxxxxxxx> > cc: Jens Axboe <axboe@xxxxxxxxx> > cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > cc: netdev@xxxxxxxxxxxxxxx > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 6109a86a8a4b..fe2e48874191 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -956,6 +956,41 @@ csum_page(struct page *page, int offset, int copy) > return csum; > } > > +/* > + * Add (or copy) data pages for MSG_SPLICE_PAGES. > + */ > +static int __ip_splice_pages(struct sock *sk, struct sk_buff *skb, > + void *from, int *pcopy) > +{ > + struct msghdr *msg = from; > + struct page *page = NULL, **pages = &page; > + ssize_t copy = *pcopy; > + size_t off; > + int err; > + > + copy = iov_iter_extract_pages(&msg->msg_iter, &pages, copy, 1, 0, &off); > + if (copy <= 0) > + return copy ?: -EIO; > + > + err = skb_append_pagefrags(skb, page, off, copy); > + if (err < 0) { > + iov_iter_revert(&msg->msg_iter, copy); > + return err; > + } > + > + if (skb->ip_summed == CHECKSUM_NONE) { > + __wsum csum; > + > + csum = csum_page(page, off, copy); > + skb->csum = csum_block_add(skb->csum, csum, skb->len); > + } > + > + skb_len_add(skb, copy); > + refcount_add(copy, &sk->sk_wmem_alloc); > + *pcopy = copy; > + return 0; > +} > + > static int __ip_append_data(struct sock *sk, > struct flowi4 *fl4, > struct sk_buff_head *queue, > @@ -1047,6 +1082,15 @@ static int __ip_append_data(struct sock *sk, > skb_zcopy_set(skb, uarg, &extra_uref); > } > } > + } else if ((flags & MSG_SPLICE_PAGES) && length) { > + if (inet->hdrincl) > + return -EPERM; > + if (rt->dst.dev->features & NETIF_F_SG) { > + /* We need an empty buffer to attach stuff to */ > + paged = true; > + } else { > + flags &= ~MSG_SPLICE_PAGES; > + } > } > > cork->length += length; > @@ -1206,6 +1250,10 @@ static int __ip_append_data(struct sock *sk, > err = -EFAULT; > goto error; > } > + } else if (flags & MSG_SPLICE_PAGES) { > + err = __ip_splice_pages(sk, skb, from, ©); > + if (err < 0) > + goto error; > } else if (!zc) { > int i = skb_shinfo(skb)->nr_frags; > >