[PATCH bpf-next 7/9] bpf: cpumap: switch to napi_skb_cache_get_bulk()

Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> · Fri, 30 Aug 2024 18:25:06 +0200

Now that cpumap uses GRO, which drops unused skb heads to the NAPI
cache, use napi_skb_cache_get_bulk() to try to reuse cached entries
and lower the MM layer pressure. The polling loop already happens in
the BH context, so the switch is safe from that perspective.
The better GRO aggregates packets, the less new skbs will be allocated.
If an aggregated skb contains 16 frags, this means 15 skbs were returned
to the cache, so next 15 skbs will be built without allocating anything.

The same trafficgen UDP GRO test now shows:

                GRO off   GRO on
threaded GRO    2.3       4         Mpps
thr bulk GRO    2.4       4.7       Mpps
diff            +4        +17       %

Comparing to the baseline cpumap:

baseline        2.7       N/A       Mpps
thr bulk GRO    2.4       4.7       Mpps
diff            -11       +74       %

Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>
---
 kernel/bpf/cpumap.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index d7206f3f6e80..992f4e30a589 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -286,7 +286,6 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget)
 	rcpu = container_of(napi, typeof(*rcpu), napi);
 
 	while (likely(done < budget)) {
-		gfp_t gfp = __GFP_ZERO | GFP_ATOMIC;
 		int i, n, m, nframes, xdp_n;
 		void *frames[CPUMAP_BATCH];
 		void *skbs[CPUMAP_BATCH];
@@ -331,8 +330,7 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget)
 		if (!nframes)
 			continue;
 
-		m = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, gfp,
-					  nframes, skbs);
+		m = napi_skb_cache_get_bulk(skbs, nframes);
 		if (unlikely(!m)) {
 			for (i = 0; i < nframes; i++)
 				skbs[i] = NULL; /* effect: xdp_return_frame */
-- 
2.46.0