On Wed, Sep 7, 2022 at 5:26 AM Alexandra Winter <wintera@xxxxxxxxxxxxx> wrote: > > Since linear payload was removed even for single small messages, > an additional page is required and we are measuring performance impact. > > 3613b3dbd1ad ("tcp: prepare skbs for better sack shifting") > explicitely allowed "payload in skb->head for first skb put in the queue, > to not impact RPC workloads." > 472c2e07eef0 ("tcp: add one skb cache for tx") > made that obsolete and removed it. > When > d8b81175e412 ("tcp: remove sk_{tr}x_skb_cache") > reverted it, this piece was not reverted and not added back in. > > When running uperf with a request-response pattern with 1k payload > and 250 connections parallel, we measure 13% difference in throughput > for our PCI based network interfaces since 472c2e07eef0. > (our IO MMU is sensitive to the number of mapped pages) > > Could you please consider allowing linear payload for the first > skb in queue again? A patch proposal is appended below. No. Please add a work around in your driver. You can increase throughput by 20% by premapping a coherent piece of memory in which you can copy small skbs (skb->head included) Something like 256 bytes per slot in the TX ring. > > Kind regards > Alexandra > > --------------------------------------------------------------- > > tcp: allow linear skb payload for first in queue > > Allow payload in skb->head for first skb in the queue, > RPC workloads will benefit. > > Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx") > Signed-off-by: Alexandra Winter <wintera@xxxxxxxxxxxxx> > --- > net/ipv4/tcp.c | 39 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 37 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index e5011c136fdb..f7cbccd41d85 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1154,6 +1154,30 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset, > } > EXPORT_SYMBOL(tcp_sendpage); > > +/* Do not bother using a page frag for very small frames. > + * But use this heuristic only for the first skb in write queue. > + * > + * Having no payload in skb->head allows better SACK shifting > + * in tcp_shift_skb_data(), reducing sack/rack overhead, because > + * write queue has less skbs. > + * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB. > + * This also speeds up tso_fragment(), since it won't fallback > + * to tcp_fragment(). > + */ > +static int linear_payload_sz(bool first_skb) > +{ > + if (first_skb) > + return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER); > + return 0; > +} > + > +static int select_size(bool first_skb, bool zc) > +{ > + if (zc) > + return 0; > + return linear_payload_sz(first_skb); > +} > + > void tcp_free_fastopen_req(struct tcp_sock *tp) > { > if (tp->fastopen_req) { > @@ -1311,6 +1335,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) > > if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) { > bool first_skb; > + int linear; > > new_segment: > if (!sk_stream_memory_free(sk)) > @@ -1322,7 +1347,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) > goto restart; > } > first_skb = tcp_rtx_and_write_queues_empty(sk); > - skb = tcp_stream_alloc_skb(sk, 0, sk->sk_allocation, > + linear = select_size(first_skb, zc); > + skb = tcp_stream_alloc_skb(sk, linear, > + sk->sk_allocation, > first_skb); > if (!skb) > goto wait_for_space; > @@ -1344,7 +1371,15 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) > if (copy > msg_data_left(msg)) > copy = msg_data_left(msg); > > - if (!zc) { > + /* Where to copy to? */ > + if (skb_availroom(skb) > 0 && !zc) { > + /* We have some space in skb head. Superb! */ > + copy = min_t(int, copy, skb_availroom(skb)); > + err = skb_add_data_nocache(sk, skb, &msg->msg_iter, > + copy); > + if (err) > + goto do_error; > + } else if (!zc) { > bool merge = true; > int i = skb_shinfo(skb)->nr_frags; > struct page_frag *pfrag = sk_page_frag(sk); > -- > 2.24.3 (Apple Git-128) >