On Sat, Jan 8, 2022 at 8:44 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > To improve scalability, this patch adds a percpu hazard pointer scheme to > keep lazily-used mms alive. Each CPU has a single pointer to an mm that > must not be freed, and __mmput() checks the pointers belonging to all CPUs > that might be lazily using the mm in question. Ugh. This feels horribly fragile to me, and also looks like it makes some common cases potentially quite expensive for machines with large CPU counts if they don't do that mm_cpumask optimization - which in turn feels quite fragile as well. IOW, this just feels *complicated*. And I think it's overly so. I get the strong feeling that we could make the rules much simpler and more straightforward. For example, how about we make the rules be - a lazy TLB mm reference requires that there's an actual active user of that mm (ie "mm_users > 0") - the last mm_users decrement (ie __mmput) forces a TLB flush, and that TLB flush must make sure that no lazy users exist (which I think it does already anyway). Doesn't that seem like a really simple set of rules? And the nice thing about it is that we *already* do that required TLB flush in all normal circumstances. __mmput() already calls exit_mmap(), and exit_mm() already forces that TLB flush in every normal situation. So we might have to make sure that every architecture really does that "drop lazy mms on TLB flush", and maybe add a flag to the existing 'struct mmu_gather tlb' to make sure that flush actually always happens (even if the process somehow managed to unmap all vma's even before exiting). Is there something silly I'm missing? Somebody pat me on the head, and say "There, there, Linus, don't try to get involved with things you don't understand.." and explain to me in small words. Linus