Re: [RFC PATCH 0/3] rework mmap-exit vs. oom_reaper handover

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Tue, 11 Sep 2018 00:40:23 +0900

On 2018/09/11 0:11, Michal Hocko wrote:
> On Mon 10-09-18 23:59:02, Tetsuo Handa wrote:
>> Thank you for proposing a patch.
>>
>> On 2018/09/10 21:55, Michal Hocko wrote:
>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>> index 5f2b2b1..99bb9ce 100644
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -3091,7 +3081,31 @@ void exit_mmap(struct mm_struct *mm)
>>>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>>>  	/* Use -1 here to ensure all VMAs in the mm are unmapped */
>>>  	unmap_vmas(&tlb, vma, 0, -1);
>>
>> unmap_vmas() might involve hugepage path. Is it safe to race with the OOM reaper?
>>
>>   i_mmap_lock_write(vma->vm_file->f_mapping);
>>   __unmap_hugepage_range_final(tlb, vma, start, end, NULL);
>>   i_mmap_unlock_write(vma->vm_file->f_mapping);
> 
> We do not unmap hugetlb pages in the oom reaper.
> 

But the OOM reaper can run while __unmap_hugepage_range_final() is in progress.
Then, I worry an overlooked race similar to clearing VM_LOCKED flag.

> 
>>
>>>  	tlb_finish_mmu(&tlb, 0, -1);
>>>  
>>>  	/*
>>
>> Also, how do you plan to give this thread enough CPU resources, for this thread might
>> be SCHED_IDLE priority? Since this thread might not be a thread which is exiting
>> (because this is merely a thread which invoked __mmput()), we can't use boosting
>> approach. CPU resource might be given eventually unless schedule_timeout_*() is used,
>> but it might be deadly slow if allocating threads keep wasting CPU resources.
> 
> This is OOM path which is glacial slow path. This is btw. no different
> from any other low priority tasks sitting on a lot of memory trying to
> release the memory (either by unmapping or exiting). Why should be this
> particular case any different?
> 

Not a problem if not under OOM situation. Since the OOM killer keeps wasting
CPU resources until memory reclaim completes, we want to solve OOM situation
as soon as possible.

>> Also, why MMF_OOM_SKIP will not be set if the OOM reaper handed over?
> 
> The idea is that the mm is not visible to anybody (except for the oom
> reaper) anymore. So MMF_OOM_SKIP shouldn't matter.
> 

I think it absolutely matters. The OOM killer waits until MMF_OOM_SKIP is set
on a mm which is visible via task_struct->signal->oom_mm .