Greetings: Welcome to v9. This revisions adds a commit which updates the page_pool documentation to describe the stats API, structures, and fields. Additionally, this revision contains a minor cosmetic change suggested by Saeed in page_pool_recycle_in_ring in commit 2: "page_pool: Add recycle stats", which removes an unnecessary #ifdef. There are no functional changes in this revision. Benchmark output from the v7 cover [1] is pasted below, as it is still relevant since no functional changes have been made in this revision: Benchmarks have been re-run. As always, results between runs are highly variable; you'll find results showing that stats disabled are both faster and slower than stats enabled in back to back benchmark runs. Raw benchmark output with stats off [2] and stats on [3] are available for examination. Test system: - 2x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz - 2 NUMA zones, with 18 cores per zone and 2 threads per core bench_page_pool_simple results, loops=200000000 test name stats enabled stats disabled cycles nanosec cycles nanosec for_loop 0 0.335 0 0.336 atomic_inc 14 6.106 13 6.022 lock 30 13.365 32 13.968 no-softirq-page_pool01 75 32.884 74 32.308 no-softirq-page_pool02 79 34.696 74 32.302 no-softirq-page_pool03 110 48.005 105 46.073 tasklet_page_pool01_fast_path 14 6.156 14 6.211 tasklet_page_pool02_ptr_ring 41 18.028 39 17.391 tasklet_page_pool03_slow 107 46.646 105 46.123 bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=4: test name stats enabled stats disabled cycles nanosec cycles nanosec page_pool_cross_cpu CPU(0) 3973 1731.596 4015 1750.015 page_pool_cross_cpu CPU(1) 3976 1733.217 4022 1752.864 page_pool_cross_cpu CPU(2) 3973 1731.615 4016 1750.433 page_pool_cross_cpu CPU(3) 3976 1733.218 4021 1752.806 page_pool_cross_cpu CPU(4) 994 433.305 1005 438.217 page_pool_cross_cpu average 3378 - 3415 - bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=8: test name stats enabled stats disabled cycles nanosec cycles nanosec page_pool_cross_cpu CPU(0) 6969 3037.488 6909 3011.463 page_pool_cross_cpu CPU(1) 6974 3039.469 6913 3012.961 page_pool_cross_cpu CPU(2) 6969 3037.575 6910 3011.585 page_pool_cross_cpu CPU(3) 6974 3039.415 6913 3012.961 page_pool_cross_cpu CPU(4) 6969 3037.288 6909 3011.368 page_pool_cross_cpu CPU(5) 6972 3038.732 6913 3012.920 page_pool_cross_cpu CPU(6) 6969 3037.350 6909 3011.386 page_pool_cross_cpu CPU(7) 6973 3039.356 6913 3012.921 page_pool_cross_cpu CPU(8) 871 379.934 864 376.620 page_pool_cross_cpu average 6293 - 6239 - Thanks. [1]: https://lore.kernel.org/all/1645810914-35485-1-git-send-email-jdamato@xxxxxxxxxx/ [2]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_disabled [3]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_enabled v8 -> v9: - Add documentation about the page_pool_get_stats API, stats structures, and fields to Documentation/networking/page_pool.rst. - Remove unnecessary #ifdef in page_pool_recycle_in_ring. v7 -> v8: - Rename mlx5 ethtool stats so that users have a better idea of their meaning. v6 -> v7: - stats split out into two structs one single per-page pool struct for allocation path stats and one per-cpu pointer for recycle path stats. - page_pool_get_stats updated to use a wrapper struct to gather stats for allocation and recycle stats with a single argument. - placement of structs adjusted - mlx5 driver modified to use page_pool_get_stats API v5 -> v6: - Per cpu page_pool_stats struct pointer is now marked as ____cacheline_aligned_in_smp. Placement of the field in the struct is unchanged; it is the last field. v4 -> v5: - Fixed the description of the kernel option in Kconfig. - Squashed commits 1-10 from v4 into a single commit for easier review. - Changed the comment style of the comment for the this_cpu_inc_alloc_stat macro. - Changed the return type of page_pool_get_stats from struct page_pool_stat * to bool. v3 -> v4: - Restructured stats to be per-cpu per-pool. - Global stats and proc file were removed. - Exposed an API (page_pool_get_stats) for batching the pool stats. v2 -> v3: - patch 8/10 ("Add stat tracking cache refill") fixed placement of counter increment. - patch 10/10 ("net-procfs: Show page pool stats in proc") updated: - fix unused label warning from kernel test robot, - fixed page_pool_seq_show to only display the refill stat once, - added a remove_proc_entry for page_pool_stat to dev_proc_net_exit. v1 -> v2: - A new kernel config option has been added, which defaults to N, preventing this code from being compiled in by default - The stats structure has been converted to a per-cpu structure - The stats are now exported via proc (/proc/net/page_pool_stat) Joe Damato (5): page_pool: Add allocation stats page_pool: Add recycle stats page_pool: Add function to batch and return stats Documentation: update networking/page_pool.rst mlx5: add support for page_pool_get_stats Documentation/networking/page_pool.rst | 56 +++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 75 ++++++++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 27 +++++++- include/net/page_pool.h | 51 ++++++++++++++ net/Kconfig | 13 ++++ net/core/page_pool.c | 79 ++++++++++++++++++++-- 6 files changed, 294 insertions(+), 7 deletions(-) -- 2.7.4