XDP bulk APIs introduce a defer/flush mechanism to return pages belonging to the same xdp_mem_allocator object (identified via the mem.id field) in bulk to optimize I-cache and D-cache since xdp_return_frame is usually run inside the driver NAPI tx completion loop. Convert mvneta, mvpp2 and mlx5 drivers to xdp_return_frame_bulk APIs. More details on benchmarks run on mlx5 can be found here: https://github.com/xdp-project/xdp-project/blob/master/areas/mem/xdp_bulk_return01.org Changes since v5: - do not keep looping over ptr_ring if the cache is full but release leftover pages running page_pool_return_page Changes since v4: - fix comments - introduce xdp_frame_bulk_init utility routine - compiler annotations for I-cache code layout - move rcu_read_lock outside fast-path - mlx5 xdp bulking code optimization Changes since v3: - align DEV_MAP_BULK_SIZE to XDP_BULK_QUEUE_SIZE - refactor page_pool_put_page_bulk to avoid code duplication Changes since v2: - move mvneta changes in a dedicated patch Changes since v1: - improve comments - rework xdp_return_frame_bulk routine logic - move count and xa fields at the beginning of xdp_frame_bulk struct - invert logic in page_pool_put_page_bulk for loop Acked-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> Lorenzo Bianconi (5): net: xdp: introduce bulking for xdp tx return path net: page_pool: add bulk support for ptr_ring net: mvneta: add xdp tx return bulking support net: mvpp2: add xdp tx return bulking support net: mlx5: add xdp tx return bulking support drivers/net/ethernet/marvell/mvneta.c | 10 ++- .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 10 ++- .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 22 ++++-- include/net/page_pool.h | 26 +++++++ include/net/xdp.h | 17 ++++- net/core/page_pool.c | 70 ++++++++++++++++--- net/core/xdp.c | 54 ++++++++++++++ 7 files changed, 192 insertions(+), 17 deletions(-) -- 2.26.2