Excerpts from Andy Lutomirski's message of September 3, 2021 3:11 pm: > On 9/2/21 5:46 PM, Nicholas Piggin wrote: >> Excerpts from Andrew Morton's message of September 3, 2021 8:53 am: >>> On Thu, 2 Sep 2021 15:50:03 -0700 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >>> >>>> On Thu, Sep 2, 2021 at 3:29 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote: >>>>> >>>>> This pile is: >>>>> >>>>> Nacked-by: Andy Lutomirski <luto@xxxxxxxxxx> >>>> >>>> Can you specify exactly the range you want me to drop? >>>> >>>> I assume it's the four patches 117-120, ie >>>> >>>> lazy tlb: introduce lazy mm refcount helper functions >>>> lazy tlb: allow lazy tlb mm refcounting to be configurable >>>> lazy tlb: shoot lazies, a non-refcounting lazy tlb option >>>> powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN >>>> >>>> but I just want to double-check before I do surgery on that series. >>> >>> Yes, those 4. >>> >>> Sorry, I missed that email thread... >>> >> >> That's not reasonable. Andy has had complete misunderstandings about the >> series which seems to stem from x86's horrible hacks that have gone in >> has confused him. > > The horrible hacks in question are almost exclusively in core code. No, they're in x86. > Here's a brief summary of the situation. > > There's a messy interaction between mmget()/mmdrop() and membarrier. > membarrier currently depends on some mmget() and mmdrop() calls to be > full barriers. Membarrier has had (and is improving but still has) some complexity, which is caused by interaction with _existing_ lazy-mm code in the tree. The complexity is not with lazy-mm itself, and my series does not add more to the membarrier interaction. So I don't accept the criticism that it has to do with membarrier complexity. > You make membarrier keep working by putting an ifdef'd > smp_mb() in the core scheduler. Sure, it's well commented and replaces the smp_mb provided by atomic operation that membarrier relied on to an explicit one. That's not a horrible hack. > I clean up the code to make it work > independently of smp_mb() and therefore save the cost of the > unconditional barrier for non-membarrier-using programs. Great. Nothing to do with this series though which is not changing membarrier ordering. I can certainly help you rebase it on top of these patches if you need. > > Your series adds an option MMU_LAZY_TLB_REFCOUNT=n for architectures to > opt out of lazy TLB refcounting. This is simply wrong. Right now, the > core scheduler provides current->active_mm and guarantees that > current->active_mm always points to a live (possibly mm_users == 0 but > definitely not freed) mm_struct. With MMU_LAZY_TLB_REFCOUNT=n, > current->active_mm still exists, is still updated, but may point to > freed memory. Wrong. It does nothing of the sort. I told you this in the previous discussion, you obviously ignored me. You are just wrong, and you can't actually point to where this happens. This criticism is invalid too. > I consider this unacceptable. A comment says "This can > be disabled if the architecture ensures no CPUs are using an mm as a > "lazy tlb" beyond its final refcount" -- that's nice, but saying "well, > if you this, you have to make sure you don't accidentally dereference > that juicy dangling pointer we give you" is, in my book, a poor > justification. It's not a justification, it's a recipe for other archs which might want ot implement it! > > I have no particular objection to the actual shoot lazies part, except > insofar as I think we can do even better (e.g. my patch). But 90% of > the complexity of my WIP series is cleaning up the mess so that we can > have a maintainable lazy mm mechanism instead of expanding the current > hard-to-maintain part into three separate possible modes. What actually happened is that when you ran out of (incorrect) technical disputes like this confusion about the active_mm thing, you then started to demand that I massively rework core code that you don't understand so that it matches the horrible mess that x86 has got itself into. I can bring up quotes from previous threads. > > Maybe I'm holding my own patches to an excessively high standard. > > > >> >> My series doesn't affect x86 at all and it's no reason why Andy's series >> to improve x86 can't be merged later. But that half finished series he >> keeps threatening with has been sitting there for almost a year now and >> it's gone nowhere, while there have been no unresolved technical >> objections to mine, it works, it's simple and small. > > My series barely touches x86. I'm talking about your previous insistence that my patch series removed "active_mm" from core code, because it doesn't match x86 internals, and similar such stupidity. > The only "hack" is that x86 may have a > CPU that has ->mm == NULL, ->active_mm != NULL, CR3 pointing to the init > pgd, and mm_cpumask clear. I don't see why this is a problem other than > being somewhat unusual. But x86 bare metal, like every architecture > that can only flush the TLB using an IPI, can very efficiently shoot > lazies, since it shoots the lazies anyway when tearing down pagetables, > but actually enabling the config option with this series applied will > result in ->active_mm pointing to freed memory. Ick. > >> >> I've kept trying to offer to help Andy with reviewing his stuff or fix >> the horrible x86 hacks, but nothing. > > I haven't finished it yet. Sorry. > No need to be sorry about that, it will be trivial to rebase on top of my series, I've even done a quick attempt. No problem at all. Thanks, Nick