> On Dec 3, 2020, at 2:13 PM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm: >>> On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote: >>> >>> power: same as ARM, except that the loop may be rather larger since >>> the systems are bigger. But I imagine it's still faster than Nick's >>> approach -- a cmpxchg to a remote cacheline should still be faster than >>> an IPI shootdown. >> >> While a single atomic might be cheaper than an IPI, the comparison >> doesn't work out nicely. You do the xchg() on every unlazy, while the >> IPI would be once per process exit. >> >> So over the life of the process, it might do very many unlazies, adding >> up to a total cost far in excess of what the single IPI would've been. > > Yeah this is the concern, I looked at things that add cost to the > idle switch code and it gets hard to justify the scalability improvement > when you slow these fundmaental things down even a bit. v2 fixes this and is generally much nicer. I’ll send it out in a couple hours. > > I still think working on the assumption that IPIs = scary expensive > might not be correct. An IPI itself is, but you only issue them when > you've left a lazy mm on another CPU which just isn't that often. > > Thanks, > Nick