Re: [PATCH net-next v2 5/8] net: skbuff: introduce napi_skb_cache_get_bulk()

Paolo Abeni <pabeni@xxxxxxxxxx> · Thu, 9 Jan 2025 14:16:22 +0100

On 1/7/25 4:29 PM, Alexander Lobakin wrote:
> Add a function to get an array of skbs from the NAPI percpu cache.
> It's supposed to be a drop-in replacement for
> kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and
> xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the
> requirement to call it only from the BH) is that it tries to use
> as many NAPI cache entries for skbs as possible, and allocate new
> ones only if needed.
> 
> The logic is as follows:
> 
> * there is enough skbs in the cache: decache them and return to the
>   caller;
> * not enough: try refilling the cache first. If there is now enough
>   skbs, return;
> * still not enough: try allocating skbs directly to the output array
>   with %GFP_ZERO, maybe we'll be able to get some. If there's now
>   enough, return;
> * still not enough: return as many as we were able to obtain.
> 
> Most of times, if called from the NAPI polling loop, the first one will
> be true, sometimes (rarely) the second one. The third and the fourth --
> only under heavy memory pressure.
> It can save significant amounts of CPU cycles if there are GRO cycles
> and/or Tx completion cycles (anything that descends to
> napi_skb_cache_put()) happening on this CPU.
> 
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>
> Tested-by: Daniel Xu <dxu@xxxxxxxxx>
> ---
>  include/linux/skbuff.h |  1 +
>  net/core/skbuff.c      | 62 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 63 insertions(+)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index bb2b751d274a..1c089c7c14e1 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1315,6 +1315,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb,
>  				 void *data, unsigned int frag_size);
>  void skb_attempt_defer_free(struct sk_buff *skb);
>  
> +u32 napi_skb_cache_get_bulk(void **skbs, u32 n);
>  struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
>  struct sk_buff *slab_build_skb(void *data);
>  
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a441613a1e6c..42eb31dcc9ce 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -367,6 +367,68 @@ static struct sk_buff *napi_skb_cache_get(void)
>  	return skb;
>  }
>  
> +/**
> + * napi_skb_cache_get_bulk - obtain a number of zeroed skb heads from the cache
> + * @skbs: pointer to an at least @n-sized array to fill with skb pointers
> + * @n: number of entries to provide
> + *
> + * Tries to obtain @n &sk_buff entries from the NAPI percpu cache and writes
> + * the pointers into the provided array @skbs. If there are less entries
> + * available, tries to replenish the cache and bulk-allocates the diff from
> + * the MM layer if needed.
> + * The heads are being zeroed with either memset() or %__GFP_ZERO, so they are
> + * ready for {,__}build_skb_around() and don't have any data buffers attached.
> + * Must be called *only* from the BH context.
> + *
> + * Return: number of successfully allocated skbs (@n if no actual allocation
> + *	   needed or kmem_cache_alloc_bulk() didn't fail).
> + */
> +u32 napi_skb_cache_get_bulk(void **skbs, u32 n)
> +{
> +	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> +	u32 bulk, total = n;
> +
> +	local_lock_nested_bh(&napi_alloc_cache.bh_lock);
> +
> +	if (nc->skb_count >= n)
> +		goto get;

I (mis?)understood from the commit message this condition should be
likely, too?!?

> +	/* No enough cached skbs. Try refilling the cache first */
> +	bulk = min(NAPI_SKB_CACHE_SIZE - nc->skb_count, NAPI_SKB_CACHE_BULK);
> +	nc->skb_count += kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
> +					       GFP_ATOMIC | __GFP_NOWARN, bulk,
> +					       &nc->skb_cache[nc->skb_count]);
> +	if (likely(nc->skb_count >= n))
> +		goto get;
> +
> +	/* Still not enough. Bulk-allocate the missing part directly, zeroed */
> +	n -= kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
> +				   GFP_ATOMIC | __GFP_ZERO | __GFP_NOWARN,
> +				   n - nc->skb_count, &skbs[nc->skb_count]);

You should probably cap 'n' to NAPI_SKB_CACHE_SIZE. Also what about
latency spikes when n == 48 (should be the maximum possible with such
limit) here?

/P