Now that cpumap uses GRO, which drops unused skb heads to the NAPI cache, use napi_skb_cache_get_bulk() to try to reuse cached entries and lower the MM layer pressure. The polling loop already happens in the BH context, so the switch is safe from that perspective. The better GRO aggregates packets, the less new skbs will be allocated. If an aggregated skb contains 16 frags, this means 15 skbs were returned to the cache, so next 15 skbs will be built without allocating anything. The same trafficgen UDP GRO test now shows: GRO off GRO on threaded GRO 2.3 4 Mpps thr bulk GRO 2.4 4.7 Mpps diff +4 +17 % Comparing to the baseline cpumap: baseline 2.7 N/A Mpps thr bulk GRO 2.4 4.7 Mpps diff -11 +74 % Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> --- kernel/bpf/cpumap.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index d7206f3f6e80..992f4e30a589 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -286,7 +286,6 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget) rcpu = container_of(napi, typeof(*rcpu), napi); while (likely(done < budget)) { - gfp_t gfp = __GFP_ZERO | GFP_ATOMIC; int i, n, m, nframes, xdp_n; void *frames[CPUMAP_BATCH]; void *skbs[CPUMAP_BATCH]; @@ -331,8 +330,7 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget) if (!nframes) continue; - m = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, gfp, - nframes, skbs); + m = napi_skb_cache_get_bulk(skbs, nframes); if (unlikely(!m)) { for (i = 0; i < nframes; i++) skbs[i] = NULL; /* effect: xdp_return_frame */ -- 2.46.0