On 7/30/19 3:24 AM, Michal Hocko wrote: > On Mon 29-07-19 17:42:20, Waiman Long wrote: >> On 7/29/19 5:21 PM, Rik van Riel wrote: >>> On Mon, 2019-07-29 at 17:07 -0400, Waiman Long wrote: >>>> It was found that a dying mm_struct where the owning task has exited >>>> can stay on as active_mm of kernel threads as long as no other user >>>> tasks run on those CPUs that use it as active_mm. This prolongs the >>>> life time of dying mm holding up some resources that cannot be freed >>>> on a mostly idle system. >>> On what kernels does this happen? >>> >>> Don't we explicitly flush all lazy TLB CPUs at exit >>> time, when we are about to free page tables? >> There are still a couple of calls that will be done until mm_count >> reaches 0: >> >> - mm_free_pgd(mm); >> - destroy_context(mm); >> - mmu_notifier_mm_destroy(mm); >> - check_mm(mm); >> - put_user_ns(mm->user_ns); >> >> These are not big items, but holding it off for a long time is still not >> a good thing. > It would be helpful to give a ball park estimation of how much that > actually is. If we are talking about few pages worth of pages per idle > cpu in the worst case then I am not sure we want to find an elaborate > way around that. We are quite likely having more in per-cpu caches in > different subsystems already. It is also quite likely that large > machines with many CPUs will have a lot of memory as well. I think they are relatively small. So I am not going to pursue it further at this point. Cheers, Longman