On Fri, Apr 26, 2024 at 08:07:45AM -0700, Suren Baghdasaryan wrote: > On Fri, Apr 26, 2024 at 7:00 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > Intel's 0day got back to me with data and it's ridiculously good. > > Headline figure: over 3x throughput improvement with vm-scalability > > https://lore.kernel.org/all/202404261055.c5e24608-oliver.sang@xxxxxxxxx/ > > > > I can't see why it's that good. It shouldn't be that good. I'm > > seeing big numbers here: > > > > 4366 ą 2% +565.6% 29061 perf-stat.overall.cycles-between-cache-misses > > > > and the code being deleted is only checking vma->vm_ops and > > vma->anon_vma. Surely that cache line is referenced so frequently > > during pagefault that deleting a reference here will make no difference > > at all? > > That indeed looks overly good. Sorry, I didn't have a chance to run > the benchmarks on my side yet because of the ongoing Android bootcamp > this week. No problem. Darn work getting in the way of having fun ;-) > > I still don't understand why we have to take the mmap_sem less often. > > Is there perhaps a VMA for which we have a NULL vm_ops, but don't set > > an anon_vma on a page fault? > > I think the only path in either do_anonymous_page() or > do_huge_pmd_anonymous_page() that skips calling anon_vma_prepare() is > the "Use the zero-page for reads" here: > https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4265. I > didn't look into this particular benchmark yet but will try it out > once I have some time to benchmark your change. Yes, Liam and I had just brainstormed that as being a plausible explanation too. I don't know how frequent it is to use anon memory read-only. Presumably it must happen often enough that we've bothered to implement the zero-page optimisation. But probably not nearly as often as this benchmark makes it happen ;-)