Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Thu, 3 Dec 2020 09:44:48 +0100

On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:

> power: same as ARM, except that the loop may be rather larger since
> the systems are bigger.  But I imagine it's still faster than Nick's
> approach -- a cmpxchg to a remote cacheline should still be faster than
> an IPI shootdown. 

While a single atomic might be cheaper than an IPI, the comparison
doesn't work out nicely. You do the xchg() on every unlazy, while the
IPI would be once per process exit.

So over the life of the process, it might do very many unlazies, adding
up to a total cost far in excess of what the single IPI would've been.

And while I appreciate all the work to get rid of the active_mm
accounting; the worry I have with pushing this all into arch code is
that it will be so very easy to get this subtly wrong.