On Tue, Jul 19, 2022 at 3:54 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > On Tue, 12 Jul 2022 21:52:25 +0100 Pavel Begunkov wrote: > > Even when zerocopy transmission is requested and possible, > > __ip_append_data() will still copy a small chunk of data just because it > > allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles > > on copy and iter manipulations and also misalignes potentially aligned > > data. Avoid such coies. And as a bonus we can allocate smaller skb. > > s/coies/copies/ can fix when applying > > > > > Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> > > --- > > net/ipv4/ip_output.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > index 00b4bf26fd93..581d1e233260 100644 > > --- a/net/ipv4/ip_output.c > > +++ b/net/ipv4/ip_output.c > > @@ -969,7 +969,6 @@ static int __ip_append_data(struct sock *sk, > > struct inet_sock *inet = inet_sk(sk); > > struct ubuf_info *uarg = NULL; > > struct sk_buff *skb; > > - > > struct ip_options *opt = cork->opt; > > int hh_len; > > int exthdrlen; > > @@ -977,6 +976,7 @@ static int __ip_append_data(struct sock *sk, > > int copy; > > int err; > > int offset = 0; > > + bool zc = false; > > unsigned int maxfraglen, fragheaderlen, maxnonfragsize; > > int csummode = CHECKSUM_NONE; > > struct rtable *rt = (struct rtable *)cork->dst; > > @@ -1025,6 +1025,7 @@ static int __ip_append_data(struct sock *sk, > > if (rt->dst.dev->features & NETIF_F_SG && > > csummode == CHECKSUM_PARTIAL) { > > paged = true; > > + zc = true; > > } else { > > uarg->zerocopy = 0; > > skb_zcopy_set(skb, uarg, &extra_uref); > > @@ -1091,9 +1092,12 @@ static int __ip_append_data(struct sock *sk, > > (fraglen + alloc_extra < SKB_MAX_ALLOC || > > !(rt->dst.dev->features & NETIF_F_SG))) > > alloclen = fraglen; > > - else { > > + else if (!zc) { > > alloclen = min_t(int, fraglen, MAX_HEADER); > > Willem, I think this came in with your GSO work, is there a reason we > use MAX_HEADER here? I thought MAX_HEADER is for headers (i.e. more or > less to be reserved) not for the min amount of data to be included. > > I wanna make sure we're not missing something about GSO here. > > Otherwise I don't think we need the extra branch but that can > be a follow up. The change was introduced for UDP GSO, to avoid copying most payload on software segmentation: " commit 15e36f5b8e982debe43e425d2e12d34e022d51e9 Author: Willem de Bruijn <willemb@xxxxxxxxxx> Date: Thu Apr 26 13:42:19 2018 -0400 udp: paged allocation with gso When sending large datagrams that are later segmented, store data in page frags to avoid copying from linear in skb_segment. " and in code - else - alloclen = datalen + fragheaderlen; + else if (!paged) + alloclen = fraglen; + else { + alloclen = min_t(int, fraglen, MAX_HEADER); + pagedlen = fraglen - alloclen; + } MAX_HEADER was a short-hand for the exact header length. "alloclen = fragheaderlen + transhdrlen;" is probably a better choice indeed. Whether with branch or without, the same change needs to be made to __ip6_append_data, just as in the referenced commit. Let's keep the stacks in sync. This is tricky code. If in doubt, run the msg_zerocopy and udp_gso tests from tools/testing/selftests/net, ideally with KASAN.