David Howells wrote: > Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be > spliced from the source iterator if possible (the iterator must be > ITER_BVEC and the pages must be spliceable). > > This allows ->sendpage() to be replaced by something that can handle > multiple multipage folios in a single transaction. > > Signed-off-by: David Howells <dhowells@xxxxxxxxxx> > cc: Eric Dumazet <edumazet@xxxxxxxxxx> > cc: "David S. Miller" <davem@xxxxxxxxxxxxx> > cc: Jakub Kicinski <kuba@xxxxxxxxxx> > cc: Paolo Abeni <pabeni@xxxxxxxxxx> > cc: Jens Axboe <axboe@xxxxxxxxx> > cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > cc: netdev@xxxxxxxxxxxxxxx > --- > net/ipv4/tcp.c | 59 +++++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 53 insertions(+), 6 deletions(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 288693981b00..77c0c69208a5 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1220,7 +1220,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) > int flags, err, copied = 0; > int mss_now = 0, size_goal, copied_syn = 0; > int process_backlog = 0; > - bool zc = false; > + int zc = 0; > long timeo; > > flags = msg->msg_flags; > @@ -1231,17 +1231,24 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) > if (msg->msg_ubuf) { > uarg = msg->msg_ubuf; > net_zcopy_get(uarg); > - zc = sk->sk_route_caps & NETIF_F_SG; > + if (sk->sk_route_caps & NETIF_F_SG) > + zc = 1; > } else if (sock_flag(sk, SOCK_ZEROCOPY)) { > uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); > if (!uarg) { > err = -ENOBUFS; > goto out_err; > } > - zc = sk->sk_route_caps & NETIF_F_SG; > - if (!zc) > + if (sk->sk_route_caps & NETIF_F_SG) > + zc = 1; > + else > uarg_to_msgzc(uarg)->zerocopy = 0; > } > + } else if (unlikely(flags & MSG_SPLICE_PAGES) && size) { > + if (!iov_iter_is_bvec(&msg->msg_iter)) > + return -EINVAL; > + if (sk->sk_route_caps & NETIF_F_SG) > + zc = 2; > } The commit message mentions MSG_SPLICE_PAGES as an internal flag. It can be passed from userspace. The code anticipates that and checks preconditions. A side effect is that legacy applications that may already be setting this bit in the flags now start failing. Most socket types are historically permissive and simply ignore undefined flags. With MSG_ZEROCOPY we chose to be extra cautious and added SOCK_ZEROCOPY, only testing the MSG_ZEROCOPY bit if this socket option is explicitly enabled. Perhaps more cautious than necessary, but FYI.