On 03.09.24 14:21, Anders Roxell wrote:
Hi, I've noticed that the futex01-thread-* tests in will-it-scale-sys-threads are running about 2% slower on v6.10-rc1 compared to v6.9, and this slowdown continues with v6.11-rc4. I am focused on identifying any performance regressions greater than 2% that occur in automated testing on arm64 HW. Using git bisect, I traced the issue to commit f002882ca369 ("mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into gup_fast_folio_allowed()").
Thanks for analyzing the (slight) regression!
My tests were performed on m7g.large and m7g.metal instances: * The slowdown is consistent regardless of the number of threads; futex1-threads-128 performs similarly to futex1-threads-2, indicating there is no scalability issue, just a minor performance overhead. * The test doesn’t involve actual futex operations, just dummy wake/wait on a variable that isn’t accessed by other threads, so the results might not be very significant. Given that this seems to be a minor increase in code path length rather than a scalability issue, would this be considered a genuine regression?
Likely not, I've seen these kinds of regressions (for example in my fork micro-benchmarks) simply because the compiler slightly changes the code layout, or suddenly decides to not inline a functions. Still it is rather unexpected, so let's find out what's happening. My first intuition would have been that the compiler now decides to not inline gup_fast_folio_allowed() anymore, adding a function call. LLVM seems to inline it for me. GCC not. Would this return the original behavior for you? diff --git a/mm/gup.c b/mm/gup.c index 69c483e2cc32d..6642f09c95881 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2726,7 +2726,8 @@ EXPORT_SYMBOL(get_user_pages_unlocked); * in the fast path, so instead we whitelist known good cases and if in doubt, * fall back to the slow path. */ -static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags) +static __always_inline bool gup_fast_folio_allowed(struct folio *folio, + unsigned int flags) { bool reject_file_backed = false; struct address_space *mapping; -- Cheers, David / dhildenb