On 2024/12/9 5:34, Alexander Duyck wrote: ... >> >> Performance validation for part2: >> 1. Using micro-benchmark ko added in patch 1 to test aligned and >> non-aligned API performance impact for the existing users, there >> seems to be about 20% performance degradation for refactoring >> page_frag to support the new API, which seems to nullify most of >> the performance gain in [3] of part1. > > So if I am understanding correctly then this is showing a 20% > performance degradation with this patchset. I would argue that it is > significant enough that it would be a blocking factor for this patch > set. I would suggest bisecting the patch set to identify where the > performance degradation has been added and see what we can do to > resolve it, and if nothing else document it in that patch so we can > identify the root cause for the slowdown. The only patch in this patchset affecting the performance of existing API seems to be patch 1, only including patch 1 does show ~20% performance degradation as including the whole patchset does: mm: page_frag: some minor refactoring before adding new API And the cause seems to be about the binary increasing as below, as the performance degradation didn't seems to change much when I tried inlining the __page_frag_cache_commit_noref() by moving it to the header file: ./scripts/bloat-o-meter vmlinux_orig vmlinux add/remove: 3/2 grow/shrink: 5/0 up/down: 920/-500 (420) Function old new delta __page_frag_cache_prepare - 500 +500 __napi_alloc_frag_align 68 180 +112 __netdev_alloc_skb 488 596 +108 napi_alloc_skb 556 624 +68 __netdev_alloc_frag_align 196 252 +56 svc_tcp_sendmsg 340 376 +36 __page_frag_cache_commit_noref - 32 +32 e843419@09a6_0000bd47_30 - 8 +8 e843419@0369_000044ee_684 8 - -8 __page_frag_alloc_align 492 - -492 Total: Before=34719207, After=34719627, chg +0.00% ./scripts/bloat-o-meter page_frag_test_orig.ko page_frag_test.ko add/remove: 0/0 grow/shrink: 2/0 up/down: 78/0 (78) Function old new delta page_frag_push_thread 508 580 +72 __UNIQUE_ID_vermagic367 67 73 +6 Total: Before=4582, After=4660, chg +1.70% Patch 1 is about refactoring common codes from __page_frag_alloc_va_align() to __page_frag_cache_prepare() and __page_frag_cache_commit(), so that the new API can make use of them as much as possible. Any better idea to reuse common codes as much as possible while avoiding the performance degradation as much as possible? > >> 2. Use the below netcat test case, there seems to be some minor >> performance gain for replacing 'page_frag' with 'page_frag_cache' >> using the new page_frag API after this patchset. >> server: taskset -c 32 nc -l -k 1234 > /dev/null >> client: perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234 > > This test would barely touch the page pool. The fact is most of the I am guessing you meant page_frag here? > overhead for this would likely be things like TCP latency and data > copy much more than the page allocation. As such fluctuations here are > likely not related to your changes. But it does tell us something that the replacing does not seems to cause obvious regression, right? I tried using a smaller MTU to amplify the impact of page allocation, it seemed to have a similar result.