Re: [RFC PATCH 0/3] rework mmap-exit vs. oom_reaper handover

Michal Hocko <mhocko@xxxxxxxxxx> · Mon, 10 Sep 2018 18:44:11 +0200

On Tue 11-09-18 00:40:23, Tetsuo Handa wrote:
> On 2018/09/11 0:11, Michal Hocko wrote:
> > On Mon 10-09-18 23:59:02, Tetsuo Handa wrote:
> >> Thank you for proposing a patch.
> >>
> >> On 2018/09/10 21:55, Michal Hocko wrote:
> >>> diff --git a/mm/mmap.c b/mm/mmap.c
> >>> index 5f2b2b1..99bb9ce 100644
> >>> --- a/mm/mmap.c
> >>> +++ b/mm/mmap.c
> >>> @@ -3091,7 +3081,31 @@ void exit_mmap(struct mm_struct *mm)
> >>>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
> >>>  	/* Use -1 here to ensure all VMAs in the mm are unmapped */
> >>>  	unmap_vmas(&tlb, vma, 0, -1);
> >>
> >> unmap_vmas() might involve hugepage path. Is it safe to race with the OOM reaper?
> >>
> >>   i_mmap_lock_write(vma->vm_file->f_mapping);
> >>   __unmap_hugepage_range_final(tlb, vma, start, end, NULL);
> >>   i_mmap_unlock_write(vma->vm_file->f_mapping);
> > 
> > We do not unmap hugetlb pages in the oom reaper.
> > 
> 
> But the OOM reaper can run while __unmap_hugepage_range_final() is in progress.
> Then, I worry an overlooked race similar to clearing VM_LOCKED flag.

But VM_HUGETLB is a persistent flag unlike VM_LOCKED IIRC.

> >>>  	tlb_finish_mmu(&tlb, 0, -1);
> >>>  
> >>>  	/*
> >>
> >> Also, how do you plan to give this thread enough CPU resources, for this thread might
> >> be SCHED_IDLE priority? Since this thread might not be a thread which is exiting
> >> (because this is merely a thread which invoked __mmput()), we can't use boosting
> >> approach. CPU resource might be given eventually unless schedule_timeout_*() is used,
> >> but it might be deadly slow if allocating threads keep wasting CPU resources.
> > 
> > This is OOM path which is glacial slow path. This is btw. no different
> > from any other low priority tasks sitting on a lot of memory trying to
> > release the memory (either by unmapping or exiting). Why should be this
> > particular case any different?
> > 
> 
> Not a problem if not under OOM situation. Since the OOM killer keeps wasting
> CPU resources until memory reclaim completes, we want to solve OOM situation
> as soon as possible.

OK, it seems that yet again we have a deep disagreement here. The point
of the OOM is to get the system out of the desperate situation. The
objective is to free up _some_ memory. Your QoS is completely off at
that moment.

> >> Also, why MMF_OOM_SKIP will not be set if the OOM reaper handed over?
> > 
> > The idea is that the mm is not visible to anybody (except for the oom
> > reaper) anymore. So MMF_OOM_SKIP shouldn't matter.
> > 
> 
> I think it absolutely matters. The OOM killer waits until MMF_OOM_SKIP is set
> on a mm which is visible via task_struct->signal->oom_mm .

Hmm, I have to re-read the exit path once again and see when we unhash
the task and how many dangerous things we do in the mean time. I might
have been overly optimistic and you might be right that we indeed have
to set MMF_OOM_SKIP after all.
-- 
Michal Hocko
SUSE Labs