Since v4, this fixes a kthread_use_mm refcounting bug and adds some comments in code and changelogs around the kthread_use_mm change in patch 1 (due to akpm's comment -- thanks). It also adds and improves comments in code, changelogs, and Kconfig options. The overall design is unchanged though. Please merge. This series has suffered some issues getting agreement, so I would like to address a few sticking points or misconceptions up front, which hopefully can result in constructive disagreement and actual actionable feedback. * That the lazy mm scheme is complicated or bug prone. This is not true, the concept is trivial and core code is extremely simple and basically unchanged since Linus' active_mm email 20 years ago in 2.3 days. This series leaves the lazy tlb switching and ->active_mm semantics entirely unchanged. It does change the refcounting, but the effects are hidden under wrappers. It does not add anything new for code outside those few places to think about except that they must specify _lazy_mm when refcounting this particular type of reference. This is not much of a problem since lazy mm references never "escape" from specific switching sequences and become hard to track. Refs that go into the wider world are always normal ones (i.e., created by explicit mmgrab or kthread_use_mm). * That membarrier code is complicated This is true. My series changes exactly nothing to do with membarriers. My series is entirely about lazy mm, which has been virtually unchanged for many years before membarrier. membarrier code takes advantage of memory ordering in scheduler switch code that lazy mm refcounting was providing, so this series adds one commented smp_mb() ifdef there to replace the refcount op being removed. That does not affect the ability to change membarrier code in future because the refcounted path has to be accounted for here anyway. In other words, any changes to membarrier code which deal with the refcounted lazy mm path that exists today, then dealing with the non refcounted option is trivial. * That active_mm should be removed from core code. I don't know how to address this other than it's not a good or well thought out idea. This is not happening and is certainly not related to my series which does not change ->active_mm semantics at all. * That this series provides an option for archs to enable which result in stale ->active_mm pointers, whereby it is up to the arch to ensure nothing dereferences those pointers. This is FUD. It has always been false. Archs that enable MMU_LAZY_TLB_SHOOTDOWN never ever have stale ->active_mm pointers, ever. If active_mm is non-NULL, then that gives exactly the same guarantees as you have today. * That performance of IPIs or other things is a problem. I posted actual numbers showing this was not a concern, and listed some options that could reduce them further if needed. No numbers were ever posted to support the other side of the argument. * That the series is a powerpc specific thing. Untrue. I have trivial sparc and alpha conversions as the first two things I looked at which I have SMP qemu environments for. * That this series somehow prevents future changes or improvements. It doesn't. * That the series is very complex, code is bad or has problems. Look at the patches. They seem pretty small and simple to me. I am happy to address specific issues that are pointed out though, and have done so. * That x86 is relevant here. This patch does not touch or affect x86 in any way. x86 has gone off and done its own horrendously complicated and under-documented thing with active_mm and the lazy mm concept. But that has been entirely hidden from core code by the arch context switching hooks. Core code continues to operate on the concept of ->mm and ->active_mm, and this series does not change that at all. x86 is no more or less divorced from that after the series. Nothing the series does constrains x86 or changes to it in future. The option can not be used immediately by x86, but there is no reason x86 could not be adapted to use it, or change their scheme to something else entirely. Where code can be adapted to be shared or made usable by x86, I have no problem with doing that. If I've missed something or I've got anything wrong with the above, I'm happy to hear it. Thanks, Nick Nicholas Piggin (4): lazy tlb: introduce lazy mm refcount helper functions lazy tlb: allow lazy tlb mm refcounting to be configurable lazy tlb: shoot lazies, a non-refcounting lazy tlb option powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN Documentation/vm/active_mm.rst | 6 ++++ arch/Kconfig | 32 +++++++++++++++++ arch/arm/mach-rpc/ecard.c | 2 +- arch/powerpc/Kconfig | 1 + arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/mm/book3s64/radix_tlb.c | 4 +-- fs/exec.c | 2 +- include/linux/sched/mm.h | 20 +++++++++++ kernel/cpu.c | 2 +- kernel/exit.c | 2 +- kernel/fork.c | 51 ++++++++++++++++++++++++++++ kernel/kthread.c | 21 +++++++----- kernel/sched/core.c | 35 +++++++++++++------ kernel/sched/sched.h | 4 ++- 14 files changed, 158 insertions(+), 26 deletions(-) -- 2.23.0