Re: [PATCH 16/23] sched: Use lightweight hazard pointers to grab lazy mms

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sat, 8 Jan 2022 20:38:02 -0800

On Sat, Jan 8, 2022 at 7:59 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> > Hmm. The x86 maintainers are on this thread, but they aren't even the
> > problem. Adding Catalin and Will to this, I think they should know
> > if/how this would fit with the arm64 ASID allocator.
> >
>
> Well, I am an x86 mm maintainer, and there is definitely a performance problem on large x86 systems right now. :)

Well, my point was that on x86, the complexities of the patch you
posted are completely pointless.

So on x86, you can just remove the mmgrab/mmdrop reference counts from
the lazy mm use entirely, and voila, that performance problem is gone.
We don't _need_ reference counting on x86 at all, if we just say that
the rule is that a lazy mm is always associated with a
honest-to-goodness live mm.

So on x86 - and any platform with the IPI model - there is no need for
hundreds of lines of complexity at all.

THAT is my point. Your patch adds complexity that buys you ABSOLUTELY NOTHING.

You then saying that the mmgrab/mmdrop is a performance problem is
just trying to muddy the water. You can just remove it entirely.

Now, I do agree that that depends on the whole "TLB IPI will get rid
of any lazy mm users on other cpus". So I agree that if you have
hardware TLB invalidation that then doesn't have that software
component to it, you need something else.

But my argument _then_ was that hardware TLB invalidation then needs
the hardware ASID thing to be useful, and the ASID management code
already effectively keeps track of "this ASID is used on other CPU's".
And that's exactly the same kind of information that your patch
basically added a separate percpu array for.

So I think that even for that hardware TLB shootdown case, your patch
only adds overhead.

And it potentially adds a *LOT* of overhead, if you replace an atomic
refcount with a "for_each_possible_cpu()" loop that has to do cmpxchg
things too.

Now, on x86, where we maintain that mm_cpumask, and as a result that
overhead is much lower - but we maintain that mm_cpumask exactly
*because* we do that IPI thing, so I don't think you can use that
argument in favor of your patch. When we do the IPI thing, your patch
is worthless overhead.

See?

Btw, you don't even need to really solve the arm64 TLB invalidate
thing - we could make the rule be that we only do the mmgrab/mmput at
all on platforms that don't do that IPI flush.

I think that's basically exactly what Nick Piggin wanted to do on powerpc, no?

But you hated that patch, for non-obvious reasons, and are now
introducing this new patch that is clearly non-optimal on x86.

So I think there's some intellectual dishonesty on your part here.

                Linus