Re: [linux-next:master] [mm] 5df397dec7: will-it-scale.per_thread_ops -53.3% regression

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 6 Dec 2022 10:41:28 -0800

On Mon, Dec 5, 2022 at 6:03 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>
> >
> > I assume that this test is doing a lot of mmap/munmap on dirty shared
> > memory regions (both because of the regression, and because of the
> > name of that test ;)
>
> I have checked the source code of will-it-scale/page_fault3.  Yes, it
> exactly does that.

Heh. I took a look at that test-case, and yeah, it's just doing a
128MB shared mapping, dirtying it one page at a time, and unmapping it
in a loop.

It doesn't even look like a very good benchmark for that, because the
_first_ time around the loop it does it is very different in that it
has to actually create the file extents.

So that benchmark starts out testing something different than what the
steady state is.

But yeah, that's pretty much the worst possible case for this all, and
yes, I suspect it's more about the TLB batching than anything else.

And I think I see the issue. We actually have a reasonably big batch
size most of the time, but this benchmark triggers that dirty shared
page logic on every page, and that in turn means that we stop batching
immediately - even when we only have the initial tiny on-stack batch.

So instead of batching MAX_GATHER_BATCH pages at a time (roughly 500
pages per go), we end up batching just eight pages (MMU_GATHER_BUNDLE)
at a time.

I didn't think of that degenerate case.

Let me think about this a while, but I think I'll have a patch for you
to test once I've dealt with a couple more pull requests.

                  Linus