Thank you for proposing a patch. On 2018/09/10 21:55, Michal Hocko wrote: > diff --git a/mm/mmap.c b/mm/mmap.c > index 5f2b2b1..99bb9ce 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3091,7 +3081,31 @@ void exit_mmap(struct mm_struct *mm) > /* update_hiwater_rss(mm) here? but nobody should be looking */ > /* Use -1 here to ensure all VMAs in the mm are unmapped */ > unmap_vmas(&tlb, vma, 0, -1); unmap_vmas() might involve hugepage path. Is it safe to race with the OOM reaper? i_mmap_lock_write(vma->vm_file->f_mapping); __unmap_hugepage_range_final(tlb, vma, start, end, NULL); i_mmap_unlock_write(vma->vm_file->f_mapping); > - free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); > + > + /* oom_reaper cannot race with the page tables teardown */ > + if (oom) > + down_write(&mm->mmap_sem); > + /* > + * Hide vma from rmap and truncate_pagecache before freeing > + * pgtables > + */ > + while (vma) { > + unlink_anon_vmas(vma); > + unlink_file_vma(vma); > + vma = vma->vm_next; > + } > + vma = mm->mmap; > + if (oom) { > + /* > + * the exit path is guaranteed to finish without any unbound > + * blocking at this stage so make it clear to the caller. > + */ > + mm->mmap = NULL; > + up_write(&mm->mmap_sem); > + } > + > + free_pgd_range(&tlb, vma->vm_start, vma->vm_prev->vm_end, > + FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); Are you trying to inline free_pgtables() here? But some architectures are using hugetlb_free_pgd_range() which does more than free_pgd_range(). Are they really safe (with regard to memory allocation dependency and flags manipulation) ? > tlb_finish_mmu(&tlb, 0, -1); > > /* Also, how do you plan to give this thread enough CPU resources, for this thread might be SCHED_IDLE priority? Since this thread might not be a thread which is exiting (because this is merely a thread which invoked __mmput()), we can't use boosting approach. CPU resource might be given eventually unless schedule_timeout_*() is used, but it might be deadly slow if allocating threads keep wasting CPU resources. Also, why MMF_OOM_SKIP will not be set if the OOM reaper handed over?