On Sat, Jul 27, 2019 at 01:10:47PM -0400, Waiman Long wrote: > It was found that a dying mm_struct where the owning task has exited > can stay on as active_mm of kernel threads as long as no other user > tasks run on those CPUs that use it as active_mm. This prolongs the > life time of dying mm holding up memory and other resources like swap > space that cannot be freed. Sure, but this has been so 'forever', why is it a problem now? > Fix that by forcing the kernel threads to use init_mm as the active_mm > if the previous active_mm is dying. > > The determination of a dying mm is based on the absence of an owning > task. The selection of the owning task only happens with the CONFIG_MEMCG > option. Without that, there is no simple way to determine the life span > of a given mm. So it falls back to the old behavior. > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > --- > include/linux/mm_types.h | 15 +++++++++++++++ > kernel/sched/core.c | 13 +++++++++++-- > mm/init-mm.c | 4 ++++ > 3 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 3a37a89eb7a7..32712e78763c 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -623,6 +623,21 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) > return atomic_read(&mm->tlb_flush_pending) > 1; > } > > +#ifdef CONFIG_MEMCG > +/* > + * A mm is considered dying if there is no owning task. > + */ > +static inline bool mm_dying(struct mm_struct *mm) > +{ > + return !mm->owner; > +} > +#else > +static inline bool mm_dying(struct mm_struct *mm) > +{ > + return false; > +} > +#endif > + > struct vm_fault; Yuck. So people without memcg will still suffer the terrible 'whatever it is this patch fixes'. > /** > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 2b037f195473..923a63262dfd 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3233,13 +3233,22 @@ context_switch(struct rq *rq, struct task_struct *prev, > * Both of these contain the full memory barrier required by > * membarrier after storing to rq->curr, before returning to > * user-space. > + * > + * If mm is NULL and oldmm is dying (!owner), we switch to > + * init_mm instead to make sure that oldmm can be freed ASAP. > */ > - if (!mm) { > + if (!mm && !mm_dying(oldmm)) { > next->active_mm = oldmm; > mmgrab(oldmm); > enter_lazy_tlb(oldmm, next); > - } else > + } else { > + if (!mm) { > + mm = &init_mm; > + next->active_mm = mm; > + mmgrab(mm); > + } > switch_mm_irqs_off(oldmm, mm, next); > + } > > if (!prev->mm) { > prev->active_mm = NULL; Bah, I see we _still_ haven't 'fixed' that code. And you're making an even bigger mess of it. Let me go find where that cleanup went.