On Wed, Jun 16, 2021, at 6:37 PM, Nicholas Piggin wrote: > Excerpts from Andy Lutomirski's message of June 17, 2021 4:41 am: > > On 6/16/21 12:35 AM, Peter Zijlstra wrote: > >> On Wed, Jun 16, 2021 at 02:19:49PM +1000, Nicholas Piggin wrote: > >>> Excerpts from Andy Lutomirski's message of June 16, 2021 1:21 pm: > >>>> membarrier() needs a barrier after any CPU changes mm. There is currently > >>>> a comment explaining why this barrier probably exists in all cases. This > >>>> is very fragile -- any change to the relevant parts of the scheduler > >>>> might get rid of these barriers, and it's not really clear to me that > >>>> the barrier actually exists in all necessary cases. > >>> > >>> The comments and barriers in the mmdrop() hunks? I don't see what is > >>> fragile or maybe-buggy about this. The barrier definitely exists. > >>> > >>> And any change can change anything, that doesn't make it fragile. My > >>> lazy tlb refcounting change avoids the mmdrop in some cases, but it > >>> replaces it with smp_mb for example. > >> > >> I'm with Nick again, on this. You're adding extra barriers for no > >> discernible reason, that's not generally encouraged, seeing how extra > >> barriers is extra slow. > >> > >> Both mmdrop() itself, as well as the callsite have comments saying how > >> membarrier relies on the implied barrier, what's fragile about that? > >> > > > > My real motivation is that mmgrab() and mmdrop() don't actually need to > > be full barriers. The current implementation has them being full > > barriers, and the current implementation is quite slow. So let's try > > that commit message again: > > > > membarrier() needs a barrier after any CPU changes mm. There is currently > > a comment explaining why this barrier probably exists in all cases. The > > logic is based on ensuring that the barrier exists on every control flow > > path through the scheduler. It also relies on mmgrab() and mmdrop() being > > full barriers. > > > > mmgrab() and mmdrop() would be better if they were not full barriers. As a > > trivial optimization, mmgrab() could use a relaxed atomic and mmdrop() > > could use a release on architectures that have these operations. > > I'm not against the idea, I've looked at something similar before (not > for mmdrop but a different primitive). Also my lazy tlb shootdown series > could possibly take advantage of this, I might cherry pick it and test > performance :) > > I don't think it belongs in this series though. Should go together with > something that takes advantage of it. I’m going to see if I can get hazard pointers into shape quickly. > > Thanks, > Nick >