On 15/01/2025 16.18, Alexander Lobakin wrote:
cpumap has its own BH context based on kthread. It has a sane batch size of 8 frames per one cycle. GRO can be used here on its own. Adjust cpumap calls to the upper stack to use GRO API instead of netif_receive_skb_list() which processes skbs by batches, but doesn't involve GRO layer at all. In plenty of tests, GRO performs better than listed receiving even given that it has to calculate full frame checksums on the CPU. As GRO passes the skbs to the upper stack in the batches of @gro_normal_batch, i.e. 8 by default, and skb->dev points to the device where the frame comes from, it is enough to disable GRO netdev feature on it to completely restore the original behaviour: untouched frames will be being bulked and passed to the upper stack by 8, as it was with netif_receive_skb_list(). Signed-off-by: Alexander Lobakin<aleksander.lobakin@xxxxxxxxx> Tested-by: Daniel Xu<dxu@xxxxxxxxx> --- kernel/bpf/cpumap.c | 45 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 42 insertions(+), 3 deletions(-)
Nice and clean code, I like it! :-) Acked-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>