On 2018/09/11 0:11, Michal Hocko wrote: > On Mon 10-09-18 23:59:02, Tetsuo Handa wrote: >> Thank you for proposing a patch. >> >> On 2018/09/10 21:55, Michal Hocko wrote: >>> diff --git a/mm/mmap.c b/mm/mmap.c >>> index 5f2b2b1..99bb9ce 100644 >>> --- a/mm/mmap.c >>> +++ b/mm/mmap.c >>> @@ -3091,7 +3081,31 @@ void exit_mmap(struct mm_struct *mm) >>> /* update_hiwater_rss(mm) here? but nobody should be looking */ >>> /* Use -1 here to ensure all VMAs in the mm are unmapped */ >>> unmap_vmas(&tlb, vma, 0, -1); >> >> unmap_vmas() might involve hugepage path. Is it safe to race with the OOM reaper? >> >> i_mmap_lock_write(vma->vm_file->f_mapping); >> __unmap_hugepage_range_final(tlb, vma, start, end, NULL); >> i_mmap_unlock_write(vma->vm_file->f_mapping); > > We do not unmap hugetlb pages in the oom reaper. > But the OOM reaper can run while __unmap_hugepage_range_final() is in progress. Then, I worry an overlooked race similar to clearing VM_LOCKED flag. > >> >>> tlb_finish_mmu(&tlb, 0, -1); >>> >>> /* >> >> Also, how do you plan to give this thread enough CPU resources, for this thread might >> be SCHED_IDLE priority? Since this thread might not be a thread which is exiting >> (because this is merely a thread which invoked __mmput()), we can't use boosting >> approach. CPU resource might be given eventually unless schedule_timeout_*() is used, >> but it might be deadly slow if allocating threads keep wasting CPU resources. > > This is OOM path which is glacial slow path. This is btw. no different > from any other low priority tasks sitting on a lot of memory trying to > release the memory (either by unmapping or exiting). Why should be this > particular case any different? > Not a problem if not under OOM situation. Since the OOM killer keeps wasting CPU resources until memory reclaim completes, we want to solve OOM situation as soon as possible. >> Also, why MMF_OOM_SKIP will not be set if the OOM reaper handed over? > > The idea is that the mm is not visible to anybody (except for the oom > reaper) anymore. So MMF_OOM_SKIP shouldn't matter. > I think it absolutely matters. The OOM killer waits until MMF_OOM_SKIP is set on a mm which is visible via task_struct->signal->oom_mm .