On Tue, 3 Sept 2024 at 14:37, David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 03.09.24 14:21, Anders Roxell wrote: > > Hi, > > > > I've noticed that the futex01-thread-* tests in will-it-scale-sys-threads > > are running about 2% slower on v6.10-rc1 compared to v6.9, and this > > slowdown continues with v6.11-rc4. I am focused on identifying any > > performance regressions greater than 2% that occur in automated > > testing on arm64 HW. > > > > Using git bisect, I traced the issue to commit > > f002882ca369 ("mm: merge folio_is_secretmem() and > > folio_fast_pin_allowed() into gup_fast_folio_allowed()"). > > Thanks for analyzing the (slight) regression! > > > > > My tests were performed on m7g.large and m7g.metal instances: > > > > * The slowdown is consistent regardless of the number of threads; > > futex1-threads-128 performs similarly to futex1-threads-2, indicating > > there is no scalability issue, just a minor performance overhead. > > * The test doesn’t involve actual futex operations, just dummy wake/wait > > on a variable that isn’t accessed by other threads, so the results might > > not be very significant. > > > > Given that this seems to be a minor increase in code path length rather > > than a scalability issue, would this be considered a genuine regression? > > Likely not, I've seen these kinds of regressions (for example in my fork > micro-benchmarks) simply because the compiler slightly changes the code > layout, or suddenly decides to not inline a functions. > > Still it is rather unexpected, so let's find out what's happening. > > My first intuition would have been that the compiler now decides to not > inline gup_fast_folio_allowed() anymore, adding a function call. > > LLVM seems to inline it for me. GCC not. > > Would this return the original behavior for you? David thank you for quick patch for me to try. This patch helped the original regression on v6.10-rc1, but on current mainline v6.11-rc6 the patch does nothing and the performance is as expeced. Cheers, Anders > > diff --git a/mm/gup.c b/mm/gup.c > index 69c483e2cc32d..6642f09c95881 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2726,7 +2726,8 @@ EXPORT_SYMBOL(get_user_pages_unlocked); > * in the fast path, so instead we whitelist known good cases and if in doubt, > * fall back to the slow path. > */ > -static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags) > +static __always_inline bool gup_fast_folio_allowed(struct folio *folio, > + unsigned int flags) > { > bool reject_file_backed = false; > struct address_space *mapping; > > > -- > Cheers, > > David / dhildenb >