On 20/12/2023 15:35, David Hildenbrand wrote: > On 20.12.23 16:05, Ryan Roberts wrote: >> On 20/12/2023 14:00, David Hildenbrand wrote: >>> [...] >>> >>>>>> >>>>> >>>>> gcc version 13.2.1 20231011 (Red Hat 13.2.1-4) (GCC) >>>>> >>>>> From Fedora 38. So "a bit" newer :P >>>>> >>>> >>>> I'll retry with newer toolchain. >>>> >>>> FWIW, with the code fix and the original compiler: >>>> >>>> Fork, order-0, Apple M2: >>>> | kernel | mean_rel | std_rel | >>>> |:----------------------|-----------:|----------:| >>>> | mm-unstable | 0.0% | 0.8% | >>>> | hugetlb-rmap-cleanups | 1.3% | 2.0% | >>>> | fork-batching | 4.3% | 1.0% | >>>> >>>> Fork, order-9, Apple M2: >>>> | kernel | mean_rel | std_rel | >>>> |:----------------------|-----------:|----------:| >>>> | mm-unstable | 0.0% | 0.8% | >>>> | hugetlb-rmap-cleanups | 0.9% | 0.9% | >>>> | fork-batching | -37.3% | 1.0% | >>>> >>>> Fork, order-0, Ampere Altra: >>>> | kernel | mean_rel | std_rel | >>>> |:----------------------|-----------:|----------:| >>>> | mm-unstable | 0.0% | 0.7% | >>>> | hugetlb-rmap-cleanups | 3.2% | 0.7% | >>>> | fork-batching | 5.5% | 1.1% | >>>> >>>> Fork, order-9, Ampere Altra: >>>> | kernel | mean_rel | std_rel | >>>> |:----------------------|-----------:|----------:| >>>> | mm-unstable | 0.0% | 0.1% | >>>> | hugetlb-rmap-cleanups | 0.5% | 0.1% | >>>> | fork-batching | -10.4% | 0.1% | >>>> >>> >>> I just gave it another quick benchmark run on that Intel system. >>> >>> hugetlb-rmap-cleanups -> fork-batching >>> >>> order-0: 0.014114 -> 0.013848 >>> >>> -1.9% >>> >>> order-9: 0.014262 -> 0.009410 >>> >>> -34% >>> >>> Note that I disable SMT and turbo, and pin the test to one CPU, to make the >>> results as stable as possible. My kernel config has anything related to >>> debugging disabled. >>> >> >> And with gcc 13.2 on arm64: >> >> Fork, order-0, Apple M2 VM: >> | kernel | mean_rel | std_rel | >> |:----------------------|-----------:|----------:| >> | mm-unstable | 0.0% | 1.5% | >> | hugetlb-rmap-cleanups | -3.3% | 1.1% | >> | fork-batching | -3.6% | 1.4% | >> >> Fork, order-9, Apple M2 VM: >> | kernel | mean_rel | std_rel | >> |:----------------------|-----------:|----------:| >> | mm-unstable | 0.0% | 1.8% | >> | hugetlb-rmap-cleanups | -5.8% | 1.3% | >> | fork-batching | -38.1% | 2.3% | >> >> Fork, order-0, Ampere Altra: >> | kernel | mean_rel | std_rel | >> |:----------------------|-----------:|----------:| >> | mm-unstable | 0.0% | 1.3% | >> | hugetlb-rmap-cleanups | -0.1% | 0.4% | >> | fork-batching | -0.4% | 0.5% | >> >> Fork, order-9, Ampere Altra: >> | kernel | mean_rel | std_rel | >> |:----------------------|-----------:|----------:| >> | mm-unstable | 0.0% | 0.1% | >> | hugetlb-rmap-cleanups | -0.1% | 0.1% | >> | fork-batching | -13.9% | 0.1% | >> >> >> So all looking good. Compiler was the issue. Sorry for the noise. > > No need to be sorry, good that we figured out what's going wrong here. > > Weird that the compiler makes such a difference here. > >> >> So please go ahead with you rmap v2 stuff, and I'll wait for you to post the >> fork and zap batching patches properly, then rebase my arm64 contpte stuff on >> top and remeasure everything. > > Yes, will get rmap v2 out soon, then start working on fork, and then try > tackling zap. I have some holiday coming up, so it might take some time -- but > there is plenty of time left. Me too, I'll be out from end of Friday, returning on 2nd Jan. Happy Christmas! >