Hi, this is your Linux kernel regression tracker. On 29.05.22 02:48, Michael Larabel wrote: > On 5/28/22 17:54, Michael Larabel wrote: >> On 5/28/22 16:18, Andrew Morton wrote: >>> On Thu, 28 Apr 2022 15:00:11 -0300 Marcelo Tosatti >>> <mtosatti@xxxxxxxxxx> wrote: >>>> On Thu, Mar 31, 2022 at 03:52:45PM +0200, Borislav Petkov wrote: >>>>> On Thu, Mar 10, 2022 at 10:22:12AM -0300, Marcelo Tosatti wrote: >>>>> Someone pointed me at this: >>>>> https://www.phoronix.com/scan.php?page=news_item&px=Linux-518-Stress-NUMA-Goes-Boom >>>>> >>>>> which says this one causes a performance regression with stress-ng's >>>>> NUMA test... >>>> >>>> This is probably do_migrate_pages that is taking too long due to >>>> synchronize_rcu(). >>>> >>>> Switching to synchronize_rcu_expedited() should probably fix it... >>>> Can you give it a try, please? >>> I guess not. >>> >>> Is anyone else able to demonstrate a stress-ng performance regression >>> due to ff042f4a9b0508? And if so, are they able to try Marcelo's >>> one-liner? >> >> Apologies I don't believe I got the email previously (or if it ended >> up in spam or otherwise overlooked) so just noticed this thread now... >> >> I have the system around and will work on verifying it can reproduce >> still and can then test the patch, should be able to get it tomorrow. >> >> Thanks and sorry about the delay. > > Had a chance to look at it today still. I was able to reproduce the > regression still on that 5950X system going from v5.17 to v5.18 (using > newer stress-ng benchmark and other system changes since the prior > tests). Confirmed it also still showed slower as of today's Git. > > I can confirm with Marcelo's patch below that the stress-ng NUMA > performance is back to the v5.17 level of performance (actually, faster) > and certainly not like what I was seeing on v5.18 or Git to this point. > > So all seems to be good with that one-liner for the stress-ng NUMA test > case. All the system details and results for those interested is > documented @ https://openbenchmarking.org/result/2205284-PTS-NUMAREGR17 > but basically amounts to: > > Stress-NG 0.14 > Test: NUMA > Bogo Ops/s > Higher Is Better > v5.17: 412.88 > v5.18: 49.33 > 20220528 Git: 49.66 > 20220528 Git + sched-rcu-exped patch: 468.81 > > Apologies again about the delay / not seeing the email thread earlier. >lru_cache_disable: replace work queue synchronization with synchronize_rcu > Thanks, > > Michael > > Tested-by: Michael Larabel <Michael@xxxxxxxxxxxxxxxxxx> Andrew, is there a reason why this patch afaics isn't mainlined yet and lingering in linux-next for so long? Michael confirmed that this patch fixes a regression three weeks ago and a few days later Stefan confirmed that his problem was solved as well: https://lore.kernel.org/regressions/79bb603e-37cb-d1dd-1e12-7ce28d7cfdae@xxxxxxxx/ Reminder: unless there are good reasons it shouldn't take this long to for reason explained in https://docs.kernel.org/process/handling-regressions.html Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. >>>> diff --git a/mm/swap.c b/mm/swap.c >>>> index bceff0cb559c..04a8bbf9817a 100644 >>>> --- a/mm/swap.c >>>> +++ b/mm/swap.c >>>> @@ -879,7 +879,7 @@ void lru_cache_disable(void) >>>> * lru_disable_count = 0 will have exited the critical >>>> * section when synchronize_rcu() returns. >>>> */ >>>> - synchronize_rcu(); >>>> + synchronize_rcu_expedited(); >>>> #ifdef CONFIG_SMP >>>> __lru_add_drain_all(true); >>>> #else >>>> >>>>