On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > > > On big systems, the mm refcount can become highly contented when doing > > a lot of context switching with threaded applications (particularly > > switching between the idle thread and an application thread). > > > > Abandoning lazy tlb slows switching down quite a bit in the important > > user->idle->user cases, so so instead implement a non-refcounted scheme > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down > > any remaining lazy ones. > > > > Shootdown IPIs are some concern, but they have not been observed to be > > a big problem with this scheme (the powerpc implementation generated > > 314 additional interrupts on a 144 CPU system during a kernel compile). > > There are a number of strategies that could be employed to reduce IPIs > > if they turn out to be a problem for some workload. > > I'm still wondering whether we can do even better. > Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes the TLB. On x86, this will shoot down all lazies as long as even a single pagetable was freed. (Or at least it will if we don't have a serious bug, but the code seems okay. We'll hit pmd_free_tlb, which sets tlb->freed_tables, which will trigger the IPI.) So, on architectures like x86, the shootdown approach should be free. The only way it ought to have any excess IPIs is if we have CPUs in mm_cpumask() that don't need IPI to free pagetables, which could happen on paravirt. Can you try to figure out why you saw any increase in IPIs? It would be nice if we can make the new code unconditional.