Excerpts from Andy Lutomirski's message of December 6, 2020 2:11 am: > >> On Dec 5, 2020, at 12:00 AM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote: >> >> >> I disagree. Until now nobody following it noticed that the mm gets >> un-lazied in other cases, because that was not too clear from the >> code (only indirectly using non-standard terminology in the arch >> support document). > >> In other words, membarrier needs a special sync to deal with the case >> when a kthread takes the mm. > > I don’t think this is actually true. Somehow the x86 oddities about > CR3 writes leaked too much into the membarrier core code and comments. > (I doubt this is x86 specific. The actual x86 specific part seems to > be that we can return to user mode without syncing the instruction > stream.) > > As far as I can tell, membarrier doesn’t care at all about laziness. > Membarrier cares about rq->curr->mm. The fact that a cpu can switch > its actual loaded mm without scheduling at all (on x86 at least) is > entirely beside the point except insofar as it has an effect on > whether a subsequent switch_mm() call serializes. Core membarrier itself doesn't care about laziness, which is why the membarrier flush should go in exit_lazy_tlb() or other x86 specific code (at least until more architectures did the same thing and we moved it into generic code). I just meant this non-serialising return as documented in the membarrier arch enablement doc specifies the lazy tlb requirement. If an mm was lazy tlb for a kernel thread and then it becomes unlazy, and if switch_mm is serialising but return to user is not, then you need a serialising instruction somewhere before return to user. unlazy is the logical place to add that, because the lazy tlb mm (i.e., switching to a kernel thread and back without switching mm) is what opens the hole. > If we notify > membarrier about x86’s asynchronous CR3 writes, then membarrier needs > to understand what to do with them, which results in an unmaintainable > mess in membarrier *and* in the x86 code. How do you mean? exit_lazy_tlb is the opposite, core scheduler notifying arch code about when an mm becomes not-lazy, and nothing to do with membarrier at all even. It's a convenient hook to do your un-lazying. I guess you can do it also checking things in switch_mm and keeping state in arch code, I don't think that's necessarily the best place to put it. So membarrier code is unchanged (it cares that the serialise is done at un-lazy time), core code is simpler (no knowledge of this membarrier quirk and it already knows about lazy-tlb so the calls actually improve the documentation), and x86 code I would argue becomes nicer (or no real difference at worst) because you can move some exit lazy tlb handling to that specific call rather than decipher it from switch_mm. > > I’m currently trying to document how membarrier actually works, and > hopefully this will result in untangling membarrier from mmdrop() and > such. That would be nice. > > A silly part of this is that x86 already has a high quality > implementation of most of membarrier(): flush_tlb_mm(). If you flush > an mm’s TLB, we carefully propagate the flush to all threads, with > attention to memory ordering. We can’t use this directly as an > arch-specific implementation of membarrier because it has the annoying > side affect of flushing the TLB and because upcoming hardware might be > able to flush without guaranteeing a core sync. (Upcoming means Zen > 3, but the Zen 3 implementation is sadly not usable by Linux.) > A hardware broadcast TLB flush, you mean? What makes it unusable by Linux out of curiosity?