On 7/29/19 10:27 AM, Peter Zijlstra wrote: > On Mon, Jul 29, 2019 at 10:52:35AM +0200, Peter Zijlstra wrote: >> On Sat, Jul 27, 2019 at 01:10:47PM -0400, Waiman Long wrote: >>> It was found that a dying mm_struct where the owning task has exited >>> can stay on as active_mm of kernel threads as long as no other user >>> tasks run on those CPUs that use it as active_mm. This prolongs the >>> life time of dying mm holding up memory and other resources like swap >>> space that cannot be freed. >> Sure, but this has been so 'forever', why is it a problem now? >> >>> Fix that by forcing the kernel threads to use init_mm as the active_mm >>> if the previous active_mm is dying. >>> >>> The determination of a dying mm is based on the absence of an owning >>> task. The selection of the owning task only happens with the CONFIG_MEMCG >>> option. Without that, there is no simple way to determine the life span >>> of a given mm. So it falls back to the old behavior. >>> >>> Signed-off-by: Waiman Long <longman@xxxxxxxxxx> >>> --- >>> include/linux/mm_types.h | 15 +++++++++++++++ >>> kernel/sched/core.c | 13 +++++++++++-- >>> mm/init-mm.c | 4 ++++ >>> 3 files changed, 30 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >>> index 3a37a89eb7a7..32712e78763c 100644 >>> --- a/include/linux/mm_types.h >>> +++ b/include/linux/mm_types.h >>> @@ -623,6 +623,21 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) >>> return atomic_read(&mm->tlb_flush_pending) > 1; >>> } >>> >>> +#ifdef CONFIG_MEMCG >>> +/* >>> + * A mm is considered dying if there is no owning task. >>> + */ >>> +static inline bool mm_dying(struct mm_struct *mm) >>> +{ >>> + return !mm->owner; >>> +} >>> +#else >>> +static inline bool mm_dying(struct mm_struct *mm) >>> +{ >>> + return false; >>> +} >>> +#endif >>> + >>> struct vm_fault; >> Yuck. So people without memcg will still suffer the terrible 'whatever >> it is this patch fixes'. > Also; why then not key off that owner tracking to free the resources > (and leave the struct mm around) and avoid touching this scheduling > hot-path ? The resources are pinned by the reference count. Making a special case will certainly mess up the existing code. It is actually a problem for systems that are mostly idle. Only the kernel->kernel case needs to be updated. If the CPUs isn't busy running user tasks, a little bit more overhead shouldn't really hurt IMHO. Cheers, Longman