Re: [PATCH v3] sched/core: Don't use dying mm as active_mm of kthreads

Michal Hocko <mhocko@xxxxxxxxxx> · Tue, 30 Jul 2019 09:24:39 +0200



On Mon 29-07-19 17:42:20, Waiman Long wrote:
> On 7/29/19 5:21 PM, Rik van Riel wrote:
> > On Mon, 2019-07-29 at 17:07 -0400, Waiman Long wrote:
> >> It was found that a dying mm_struct where the owning task has exited
> >> can stay on as active_mm of kernel threads as long as no other user
> >> tasks run on those CPUs that use it as active_mm. This prolongs the
> >> life time of dying mm holding up some resources that cannot be freed
> >> on a mostly idle system.
> > On what kernels does this happen?
> >
> > Don't we explicitly flush all lazy TLB CPUs at exit
> > time, when we are about to free page tables?
> 
> There are still a couple of calls that will be done until mm_count
> reaches 0:
> 
> - mm_free_pgd(mm);
> - destroy_context(mm);
> - mmu_notifier_mm_destroy(mm);
> - check_mm(mm);
> - put_user_ns(mm->user_ns);
> 
> These are not big items, but holding it off for a long time is still not
> a good thing.

It would be helpful to give a ball park estimation of how much that
actually is. If we are talking about few pages worth of pages per idle
cpu in the worst case then I am not sure we want to find an elaborate
way around that. We are quite likely having more in per-cpu caches in
different subsystems already. It is also quite likely that large
machines with many CPUs will have a lot of memory as well.
-- 
Michal Hocko
SUSE Labs