Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Mon, Dec 5, 2022 at 1:02 AM kernel test robot <yujie.liu@xxxxxxxxx> wrote: >> >> FYI, we noticed a -53.3% regression of will-it-scale.per_thread_ops due to commit: >> 5df397dec7c4 ("mm: delay page_remove_rmap() until after the TLB has been flushed") > > Sadly, I think this may be at least partially expected. > > The code fundamentally moves one "loop over pages" and splits it up > (with the TLB flush in between). > > Which can't be great for locality, but it's kind of fundamental for > the fix - but some of it might be due to the batch limit logic. > > I wouldn't have expected it to actually show up in any real loads, but: > >> in testcase: will-it-scale >> test: page_fault3 > > I assume that this test is doing a lot of mmap/munmap on dirty shared > memory regions (both because of the regression, and because of the > name of that test ;) I have checked the source code of will-it-scale/page_fault3. Yes, it exactly does that. > So this is hopefully an extreme case. > > Now, it's likely that this particular case probably also triggers that > > /* No more batching if we have delayed rmaps pending */ > > which means that the loops in between the TLB flushes will be smaller, > since we don't batch up as many pages as we used to before we force a > TLB (and rmap) flush and free them. > > If it's due to that batching issue it may be fixable - I'll think > about this some more, but > >> Details are as below: > > The bug it fixes ends up meaning that we run that rmap removal code > _after_ the TLB flush, and it looks like this (probably combined with > the batching limit) then causes some nasty iTLB load issues: > >> 2291312 ą 2% +1452.8% 35580378 ą 4% perf-stat.i.iTLB-loads > > although it also does look like it's at least partly due to some irq > timing issue (and/or bad NUMA/CPU migration luck): > >> 388169 +267.4% 1426305 ą 6% vmstat.system.in >> 161.37 +84.9% 298.43 ą 6% perf-stat.ps.cpu-migrations >> 172442 ą 4% +26.9% 218745 ą 8% perf-stat.ps.node-load-misses > > so it might be that some of the regression comes down to "bad luck" - > it happened to run really nicely on that particular machine, and then > the timing changes caused some random "phase change" to the load. > > The profile doesn't actually seem to show all that much more IPI > overhead, so maybe these incidental issues are what then causes that > big regression. 0.00 +8.5 8.49 5% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function >From perf profiling, the cycles for TLB flushing increases much. So I guess it may be related? > It would be lovely to hear if you see this on other machines and/or loads. Will ask 0-Day guys to check this. Best Regards, Huang, Ying > Because if it's a one-off, it's probably best ignored. If it shows up > elsewhere, I think that batching logic might need looking at. > > Linus