On Mon 29-07-19 17:42:20, Waiman Long wrote: > On 7/29/19 5:21 PM, Rik van Riel wrote: > > On Mon, 2019-07-29 at 17:07 -0400, Waiman Long wrote: > >> It was found that a dying mm_struct where the owning task has exited > >> can stay on as active_mm of kernel threads as long as no other user > >> tasks run on those CPUs that use it as active_mm. This prolongs the > >> life time of dying mm holding up some resources that cannot be freed > >> on a mostly idle system. > > On what kernels does this happen? > > > > Don't we explicitly flush all lazy TLB CPUs at exit > > time, when we are about to free page tables? > > There are still a couple of calls that will be done until mm_count > reaches 0: > > - mm_free_pgd(mm); > - destroy_context(mm); > - mmu_notifier_mm_destroy(mm); > - check_mm(mm); > - put_user_ns(mm->user_ns); > > These are not big items, but holding it off for a long time is still not > a good thing. It would be helpful to give a ball park estimation of how much that actually is. If we are talking about few pages worth of pages per idle cpu in the worst case then I am not sure we want to find an elaborate way around that. We are quite likely having more in per-cpu caches in different subsystems already. It is also quite likely that large machines with many CPUs will have a lot of memory as well. -- Michal Hocko SUSE Labs