On Mon, Dec 9, 2024 at 3:42 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: > > On 2024/12/9 5:34, Alexander Duyck wrote: > > ... > > >> > >> Performance validation for part2: > >> 1. Using micro-benchmark ko added in patch 1 to test aligned and > >> non-aligned API performance impact for the existing users, there > >> seems to be about 20% performance degradation for refactoring > >> page_frag to support the new API, which seems to nullify most of > >> the performance gain in [3] of part1. > > > > So if I am understanding correctly then this is showing a 20% > > performance degradation with this patchset. I would argue that it is > > significant enough that it would be a blocking factor for this patch > > set. I would suggest bisecting the patch set to identify where the > > performance degradation has been added and see what we can do to > > resolve it, and if nothing else document it in that patch so we can > > identify the root cause for the slowdown. > > The only patch in this patchset affecting the performance of existing API > seems to be patch 1, only including patch 1 does show ~20% performance > degradation as including the whole patchset does: > mm: page_frag: some minor refactoring before adding new API > > And the cause seems to be about the binary increasing as below, as the > performance degradation didn't seems to change much when I tried inlining > the __page_frag_cache_commit_noref() by moving it to the header file: > > ./scripts/bloat-o-meter vmlinux_orig vmlinux > add/remove: 3/2 grow/shrink: 5/0 up/down: 920/-500 (420) > Function old new delta > __page_frag_cache_prepare - 500 +500 > __napi_alloc_frag_align 68 180 +112 > __netdev_alloc_skb 488 596 +108 > napi_alloc_skb 556 624 +68 > __netdev_alloc_frag_align 196 252 +56 > svc_tcp_sendmsg 340 376 +36 > __page_frag_cache_commit_noref - 32 +32 > e843419@09a6_0000bd47_30 - 8 +8 > e843419@0369_000044ee_684 8 - -8 > __page_frag_alloc_align 492 - -492 > Total: Before=34719207, After=34719627, chg +0.00% > > ./scripts/bloat-o-meter page_frag_test_orig.ko page_frag_test.ko > add/remove: 0/0 grow/shrink: 2/0 up/down: 78/0 (78) > Function old new delta > page_frag_push_thread 508 580 +72 > __UNIQUE_ID_vermagic367 67 73 +6 > Total: Before=4582, After=4660, chg +1.70% Other than code size have you tried using perf to profile the benchmark before and after. I suspect that would be telling about which code changes are the most likely to be causing the issues. Overall I don't think the size has increased all that much. I suspect most of this is the fact that you are inlining more of the functionality. > Patch 1 is about refactoring common codes from __page_frag_alloc_va_align() > to __page_frag_cache_prepare() and __page_frag_cache_commit(), so that the > new API can make use of them as much as possible. > > Any better idea to reuse common codes as much as possible while avoiding > the performance degradation as much as possible? > > > > >> 2. Use the below netcat test case, there seems to be some minor > >> performance gain for replacing 'page_frag' with 'page_frag_cache' > >> using the new page_frag API after this patchset. > >> server: taskset -c 32 nc -l -k 1234 > /dev/null > >> client: perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234 > > > > This test would barely touch the page pool. The fact is most of the > > I am guessing you meant page_frag here? > > > overhead for this would likely be things like TCP latency and data > > copy much more than the page allocation. As such fluctuations here are > > likely not related to your changes. > > But it does tell us something that the replacing does not seems to > cause obvious regression, right? Not really. The fragment allocator is such a small portion of this test that we could probably double the cost for it and it would still be negligible. > I tried using a smaller MTU to amplify the impact of page allocation, > it seemed to have a similar result. Not surprising. However the network is likely only a small part of this. I suspect if you ran a profile it would likely show the same.