On Fri, Apr 26, 2024 at 08:32:06AM -0700, Suren Baghdasaryan wrote: > On Fri, Apr 26, 2024 at 8:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > I think the only path in either do_anonymous_page() or > > > do_huge_pmd_anonymous_page() that skips calling anon_vma_prepare() is > > > the "Use the zero-page for reads" here: > > > https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4265. I > > > didn't look into this particular benchmark yet but will try it out > > > once I have some time to benchmark your change. > > > > Yes, Liam and I had just brainstormed that as being a plausible > > explanation too. I don't know how frequent it is to use anon memory > > read-only. Presumably it must happen often enough that we've bothered > > to implement the zero-page optimisation. But probably not nearly as > > often as this benchmark makes it happen ;-) > > I also wonder if some of this improvement can be attributed to the > last patch in your series > (https://lore.kernel.org/all/20240426144506.1290619-5-willy@xxxxxxxxxxxxx/). > I assume it was included in the 0day testing? Patch 4 was where I expected to see the improvement too. But I think what's going on is that this benchmark evaded all our hard work on page fault scalability. Because it's read-only, it never assigned an anon_vma and so all its page faults fell back to taking the mmap_sem. So patch 4 will have no effect on this benchmark. The report from 0day is pretty clear they bisected the performance improvement to patch 2.