On Thu, Jul 19, 2018 at 10:04:09AM -0700, Andy Lutomirski wrote: > I added some more arch maintainers. The idea here is that, on x86 at > least, task->active_mm and all its refcounting is pure overhead. When > a process exits, __mmput() gets called, but the core kernel has a > longstanding "optimization" in which other tasks (kernel threads and > idle tasks) may have ->active_mm pointing at this mm. This is nasty, > complicated, and hurts performance on large systems, since it requires > extra atomic operations whenever a CPU switches between real users > threads and idle/kernel threads. > > It's also almost completely worthless on x86 at least, since __mmput() > frees pagetables, and that operation *already* forces a remote TLB > flush, so we might as well zap all the active_mm references at the > same time. So I disagree that active_mm is complicated (the code is less than ideal but that is actually fixable). And aside from the process exit case, it does avoid CR3 writes when switching between user and kernel threads (which can be far more often than exit if you have longer running tasks). Now agreed, recent x86 work has made that less important. And I of course also agree that not doing those refcount atomics is better.