On Wed, 4 Sept 2024 at 15:47, David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 04.09.24 12:05, Anders Roxell wrote: > > On Tue, 3 Sept 2024 at 14:37, David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >> On 03.09.24 14:21, Anders Roxell wrote: > >>> Hi, > >>> > >>> I've noticed that the futex01-thread-* tests in will-it-scale-sys-threads > >>> are running about 2% slower on v6.10-rc1 compared to v6.9, and this > >>> slowdown continues with v6.11-rc4. I am focused on identifying any > >>> performance regressions greater than 2% that occur in automated > >>> testing on arm64 HW. > >>> > >>> Using git bisect, I traced the issue to commit > >>> f002882ca369 ("mm: merge folio_is_secretmem() and > >>> folio_fast_pin_allowed() into gup_fast_folio_allowed()"). > >> > >> Thanks for analyzing the (slight) regression! > >> > >>> > >>> My tests were performed on m7g.large and m7g.metal instances: > >>> > >>> * The slowdown is consistent regardless of the number of threads; > >>> futex1-threads-128 performs similarly to futex1-threads-2, indicating > >>> there is no scalability issue, just a minor performance overhead. > >>> * The test doesn’t involve actual futex operations, just dummy wake/wait > >>> on a variable that isn’t accessed by other threads, so the results might > >>> not be very significant. > >>> > >>> Given that this seems to be a minor increase in code path length rather > >>> than a scalability issue, would this be considered a genuine regression? > >> > >> Likely not, I've seen these kinds of regressions (for example in my fork > >> micro-benchmarks) simply because the compiler slightly changes the code > >> layout, or suddenly decides to not inline a functions. > >> > >> Still it is rather unexpected, so let's find out what's happening. > >> > >> My first intuition would have been that the compiler now decides to not > >> inline gup_fast_folio_allowed() anymore, adding a function call. > >> > >> LLVM seems to inline it for me. GCC not. > >> > >> Would this return the original behavior for you? > > > > David thank you for quick patch for me to try. > > > > This patch helped the original regression on v6.10-rc1, but on current mainline > > v6.11-rc6 the patch does nothing and the performance is as expeced. > > Just so I understand this correctly: > > It fixed itself after v6.11-rc4, but v6.11-rc4 was fixed with my patch? I had to double check and no, on v6.11-rc4 with or without your patch I see the 2% regression. Cheers, Anders > > If that's the case, then it's really the compiler deciding whether to > inline or not, and on v6.11-rc6 it decides to inline again. > > -- > Cheers, > > David / dhildenb >