arm64 server: Before this patchset: fast_path ptr_ring slow 1. 31.171 ns 60.980 ns 164.917 ns 2. 28.824 ns 60.891 ns 170.241 ns 3. 14.236 ns 60.583 ns 164.355 ns With patchset: 6. 26.163 ns 53.781 ns 189.450 ns 7. 26.189 ns 53.798 ns 189.466 ns X86 server: | Test name |Cycles | 1-5 | | Nanosec | 1-5 | | % | | (tasklet_*)|Before | After |diff| Before | After | diff | change | |------------+-------+-------+----+---------+--------+--------+--------| | fast_path | 19 | 19 | 0| 5.399 | 5.492 | 0.093 | 1.7 | | ptr_ring | 54 | 57 | 3| 15.090 | 15.849 | 0.759 | 5.0 | | slow | 238 | 284 | 46| 66.134 | 78.909 | 12.775 | 19.3 | And about 16 bytes of memory is also needed for each page_pool owned page to fix the dma API misuse problem 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@xxxxxxxxxx/T/ 2. https://lore.kernel.org/all/f558df7a-d983-4fc5-8358-faf251994d23@xxxxxxxxxx/ CC: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> CC: Robin Murphy <robin.murphy@xxxxxxx> CC: Alexander Duyck <alexander.duyck@xxxxxxxxx> CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> CC: IOMMU <iommu@xxxxxxxxxxxxxxx> CC: MM <linux-mm@xxxxxxxxx> Change log: V8: 1. Drop last 3 patch as it causes observable performance degradation for x86 system. 2. Remove rcu read lock in page_pool_napi_local(). 3. Renaming item function more consistently. V7: 1. Fix a used-after-free bug reported by KASAN as mentioned by Jakub. 2. Fix the 'netmem' variable not setting up correctly bug as mentioned by Simon. V6: 1. Repost based on latest net-next. 2. Rename page_pool_to_pp() to page_pool_get_pp(). V5: 1. Support unlimit inflight pages. 2. Add some optimization to avoid the overhead of fixing bug. V4: 1. use scanning to do the unmapping 2. spilt dma sync skipping into separate patch V3: 1. Target net-next tree instead of net tree. 2. Narrow the rcu lock as the discussion in v2. 3. Check the ummapping cnt against the inflight cnt. V2: 1. Add a item_full stat. 2. Use container_of() for page_pool_to_pp(). Yunsheng Lin (5): page_pool: introduce page_pool_get_pp() API page_pool: fix timing for checking and disabling napi_local page_pool: fix IOMMU crash when driver has already unbound page_pool: support unlimited number of inflight pages page_pool: skip dma sync operation for inflight pages drivers/net/ethernet/freescale/fec_main.c | 8 +- .../ethernet/google/gve/gve_buffer_mgmt_dqo.c | 2 +- drivers/net/ethernet/intel/iavf/iavf_txrx.c | 6 +- drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +- drivers/net/ethernet/intel/libeth/rx.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 +- drivers/net/netdevsim/netdev.c | 6 +- drivers/net/wireless/mediatek/mt76/mt76.h | 2 +- include/linux/mm_types.h | 2 +- include/linux/skbuff.h | 1 + include/net/libeth/rx.h | 3 +- include/net/netmem.h | 22 +- include/net/page_pool/helpers.h | 15 + include/net/page_pool/types.h | 46 +- net/core/devmem.c | 4 +- net/core/netmem_priv.h | 5 +- net/core/page_pool.c | 425 ++++++++++++++++-- net/core/page_pool_priv.h | 10 +- net/core/xdp.c | 3 +- 19 files changed, 500 insertions(+), 79 deletions(-) -- 2.33.0