Re: [PATCH 16/23] sched: Use lightweight hazard pointers to grab lazy mms

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sat, 8 Jan 2022 11:22:48 -0800

On Sat, Jan 8, 2022 at 8:44 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> To improve scalability, this patch adds a percpu hazard pointer scheme to
> keep lazily-used mms alive.  Each CPU has a single pointer to an mm that
> must not be freed, and __mmput() checks the pointers belonging to all CPUs
> that might be lazily using the mm in question.

Ugh. This feels horribly fragile to me, and also looks like it makes
some common cases potentially quite expensive for machines with large
CPU counts if they don't do that mm_cpumask optimization - which in
turn feels quite fragile as well.

IOW, this just feels *complicated*.

And I think it's overly so. I get the strong feeling that we could
make the rules much simpler and more straightforward.

For example, how about we make the rules be

 - a lazy TLB mm reference requires that there's an actual active user
of that mm (ie "mm_users > 0")

 - the last mm_users decrement (ie __mmput) forces a TLB flush, and
that TLB flush must make sure that no lazy users exist (which I think
it does already anyway).

Doesn't that seem like a really simple set of rules?

And the nice thing about it is that we *already* do that required TLB
flush in all normal circumstances. __mmput() already calls
exit_mmap(), and exit_mm() already forces that TLB flush in every
normal situation.

So we might have to make sure that every architecture really does that
"drop lazy mms on TLB flush", and maybe add a flag to the existing
'struct mmu_gather tlb' to make sure that flush actually always
happens (even if the process somehow managed to unmap all vma's even
before exiting).

Is there something silly I'm missing? Somebody pat me on the head, and
say "There, there, Linus, don't try to get involved with things you
don't understand.." and explain to me in small words.

                  Linus