On 2024/9/9 13:43, Mina Almasry wrote: > > Perf - page-pool benchmark: > --------------------------- > > bench_page_pool_simple.ko tests with and without these changes: > https://pastebin.com/raw/ncHDwAbn > > AFAIK the number that really matters in the perf tests is the > 'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8 > cycles without the changes but there is some 1 cycle noise in some > results. > > With the patches this regresses to 9 cycles with the changes but there > is 1 cycle noise occasionally running this test repeatedly. > > Lastly I tried disable the static_branch_unlikely() in > netmem_is_net_iov() check. To my surprise disabling the > static_branch_unlikely() check reduces the fast path back to 8 cycles, > but the 1 cycle noise remains. Sorry for the late report, as I was adding a testing page_pool ko basing on [1] to avoid introducing performance regression when fixing the bug in [2]. I used it to test the performance impact of devmem patchset for page_pool too, it seems there might be some noticable performance impact quite stably for the below testcases, about 5%~16% performance degradation as below in the arm64 system: Before the devmem patchset: Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1' (100 runs): 17.167561 task-clock (msec) # 0.003 CPUs utilized ( +- 0.40% ) 8 context-switches # 0.474 K/sec ( +- 0.65% ) 0 cpu-migrations # 0.001 K/sec ( +-100.00% ) 84 page-faults # 0.005 M/sec ( +- 0.13% ) 44576552 cycles # 2.597 GHz ( +- 0.40% ) 59627412 instructions # 1.34 insn per cycle ( +- 0.03% ) 14370325 branches # 837.063 M/sec ( +- 0.02% ) 21902 branch-misses # 0.15% of all branches ( +- 0.27% ) 6.818873600 seconds time elapsed ( +- 0.02% ) Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1 test_direct=1' (100 runs): 17.595423 task-clock (msec) # 0.004 CPUs utilized ( +- 0.01% ) 8 context-switches # 0.460 K/sec ( +- 0.50% ) 0 cpu-migrations # 0.000 K/sec 84 page-faults # 0.005 M/sec ( +- 0.15% ) 45693020 cycles # 2.597 GHz ( +- 0.01% ) 59676212 instructions # 1.31 insn per cycle ( +- 0.00% ) 14385384 branches # 817.564 M/sec ( +- 0.00% ) 21786 branch-misses # 0.15% of all branches ( +- 0.14% ) 4.098627802 seconds time elapsed ( +- 0.11% ) After the devmem patchset: Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1' (100 runs): 17.047973 task-clock (msec) # 0.002 CPUs utilized ( +- 0.39% ) 8 context-switches # 0.488 K/sec ( +- 0.82% ) 0 cpu-migrations # 0.001 K/sec ( +- 70.35% ) 84 page-faults # 0.005 M/sec ( +- 0.12% ) 44269558 cycles # 2.597 GHz ( +- 0.39% ) 59594383 instructions # 1.35 insn per cycle ( +- 0.02% ) 14362599 branches # 842.481 M/sec ( +- 0.02% ) 21949 branch-misses # 0.15% of all branches ( +- 0.25% ) 7.964890303 seconds time elapsed ( +- 0.16% ) Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1 test_direct=1' (100 runs): 17.660975 task-clock (msec) # 0.004 CPUs utilized ( +- 0.02% ) 8 context-switches # 0.458 K/sec ( +- 0.57% ) 0 cpu-migrations # 0.003 K/sec ( +- 43.81% ) 84 page-faults # 0.005 M/sec ( +- 0.17% ) 45862652 cycles # 2.597 GHz ( +- 0.02% ) 59764866 instructions # 1.30 insn per cycle ( +- 0.01% ) 14404323 branches # 815.602 M/sec ( +- 0.01% ) 21826 branch-misses # 0.15% of all branches ( +- 0.19% ) 4.304644609 seconds time elapsed ( +- 0.75% ) 1. https://lore.kernel.org/all/20240906073646.2930809-2-linyunsheng@xxxxxxxxxx/ 2. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@xxxxxxxxxx/T/ >