[PATCH net-next v2 0/5] page_pool: Convert stats to u64_stats_t.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a follow-up on
        https://lore.kernel.org/all/20250213093925.x_ggH1aj@xxxxxxxxxxxxx/

to convert the page_pool statistics to u64_stats_t to avoid u64 related
problems on 32bit architectures.
While looking over it, the comment for recycle_stat_inc() says that it
is safe to use in preemptible context. The 32bit update is split into
two 32bit writes. This is "okay" if the value observed from the current
CPU but cross CPU reads may observe inconsistencies if the lower part
overflows and the upper part is not yet written.
I explained this and added x86-32 assembly in
	https://lore.kernel.org/all/20250226102703.3F7Aa2oK@xxxxxxxxxxxxx/

I don't know if it is ensured that only *one* update can happen because
the stats are per-CPU and per NAPI device. But there will be now a
warning on 32bit if this is really attempted in preemptible context.

The placement of the counters is not affected by this change except on
32bit where an additional sync member is added. For 64bit pahole output
changes from
| struct page_pool_recycle_stats {
|         u64                        cached;               /*     0     8 */
|         u64                        cache_full;           /*     8     8 */
|         u64                        ring;                 /*    16     8 */
|         u64                        ring_full;            /*    24     8 */
|         u64                        released_refcnt;      /*    32     8 */
| 
|         /* size: 40, cachelines: 1, members: 5 */
|         /* last cacheline: 40 bytes */
| };

to
| struct page_pool_recycle_stats {
|         struct u64_stats_sync      syncp;                /*     0     0 */
|         u64_stats_t                cached;               /*     0     8 */
|         u64_stats_t                cache_full;           /*     8     8 */
|         u64_stats_t                ring;                 /*    16     8 */
|         u64_stats_t                ring_full;            /*    24     8 */
|         u64_stats_t                released_refcnt;      /*    32     8 */
| 
|         /* size: 40, cachelines: 1, members: 6 */
|         /* last cacheline: 40 bytes */
| };

On 32bit struct u64_stats_sync grows by 4 bytes (plus addiional 20 with
lockdep).

For bench_page_pool_simple.ko loops=600000000 I neded up with, before:

| time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.501 ns (step:0)
| time_bench: Type:atomic_inc Per elem: 6 cycles(tsc) 3.303 ns (step:0)
| time_bench: Type:lock Per elem: 28 cycles(tsc) 14.038 ns (step:0)
| time_bench: Type:u64_stats_inc Per elem: 1 cycles(tsc) 0.565 ns (step:0)
| time_bench: Type:this_cpu_inc Per elem: 1 cycles(tsc) 0.503 ns (step:0)
| 
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool01 Per elem: 19 cycles(tsc) 9.526 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool02 Per elem: 46 cycles(tsc) 23.501 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool03 Per elem: 121 cycles(tsc) 60.697 ns (step:0)
| bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 9.531 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 45 cycles(tsc) 22.594 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool03_slow Per elem: 123 cycles(tsc) 61.969 ns (step:0)

after:
| time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.501 ns (step:0)
| time_bench: Type:atomic_inc Per elem: 6 cycles(tsc) 3.324 ns (step:0)
| time_bench: Type:lock Per elem: 28 cycles(tsc) 14.038 ns (step:0)
| time_bench: Type:u64_stats_inc Per elem: 1 cycles(tsc) 0.565 ns (step:0)
| time_bench: Type:this_cpu_inc Per elem: 1 cycles(tsc) 0.506 ns (step:0)
| 
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool01 Per elem: 18 cycles(tsc) 9.028 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 22.714 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool03 Per elem: 120 cycles(tsc) 60.428 ns (step:0)
| bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool01_fast_path Per elem: 18 cycles(tsc) 9.024 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 43 cycles(tsc) 22.028 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool03_slow Per elem: 121 cycles(tsc) 60.736 ns (step:0)

v1…v2: https://lore.kernel.org/all/20250221115221.291006-1-bigeasy@xxxxxxxxxxxxx
  - Clarified the cover mail, added stat for pahole and from bench_page_pool_simple.ko
  - Corrected page_pool_alloc_stats vs page_pool_recycle_stats type in
    the last patch.
  - Copy the counter values outside of the do {} while loop and add them
    later.
  - Redid the mlnx5 patch to make it use generic infrastructure which is
    now extended as part of this series.

Sebastian Andrzej Siewior (5):
  page_pool: Provide an empty page_pool_stats for disabled stats.
  page_pool: Add per-queue statistics.
  mlx5: Use generic code for page_pool statistics.
  page_pool: Convert page_pool_recycle_stats to u64_stats_t.
  page_pool: Convert page_pool_alloc_stats to u64_stats_t.

 Documentation/networking/page_pool.rst        |   6 +-
 .../ethernet/mellanox/mlx5/core/en_stats.c    |  87 +++----------
 .../ethernet/mellanox/mlx5/core/en_stats.h    |  30 +----
 include/linux/u64_stats_sync.h                |   5 +
 include/net/page_pool/helpers.h               |  11 ++
 include/net/page_pool/types.h                 |  31 +++--
 net/core/page_pool.c                          | 122 ++++++++++++++----
 net/core/page_pool_user.c                     |  22 ++--
 8 files changed, 163 insertions(+), 151 deletions(-)

-- 
2.47.2






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux