Re: [PATCH 16/23] sched: Use lightweight hazard pointers to grab lazy mms

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 9 Jan 2022 11:10:55 -0800

On Sun, Jan 9, 2022 at 12:49 AM Nadav Amit <nadav.amit@xxxxxxxxx> wrote:
>
> I do not know whether it is a pure win, but there is a tradeoff.

Hmm. I guess only some serious testing would tell.

On x86, I'd be a bit worried about removing lazy TLB simply because
even with ASID support there (called PCIDs by Intel for NIH reasons),
the actual ASID space on x86 was at least originally very very
limited.

Architecturally, x86 may expose 12 bits of ASID space, but iirc at
least the first few implementations actually only internally had one
or two bits, and hashed the 12 bits down to that internal very limited
hardware TLB ID space.

We only use a handful of ASIDs per CPU on x86 partly for this reason
(but also since there's no remote hardware TLB shootdown, there's no
reason to have a bigger global ASID space, so ASIDs aren't _that_
common).

And I don't know how many non-PCID x86 systems (perhaps virtualized?)
there might be out there.

But it would be very interesting to test some "disable lazy tlb"
patch. The main problem workloads tend to be IO, and I'm not sure how
many of the automated performance tests would catch issues. I guess
some threaded pipe ping-pong test (with each thread pinned to
different cores) would show it.

And I guess there is some load that triggered the original powerpc
patch by Nick&co, and that Andy has been using..

Anybody willing to cook up a patch and run some benchmarks? Perhaps
one that basically just replaces "set ->mm to NULL" with "set ->mm to
&init_mm" - so that the lazy TLB code is still *there*, but it never
triggers..

I think it's mainly 'copy_thread()' in kernel/fork.c and the 'init_mm'
initializer in mm/init-mm.c, but there's probably other things too
that have that knowledge of the special "tsk->mm = NULL" situation.

                  Linus