On Thu, 29 Mar 2018 20:27:50 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > Theoretically it is possible that an mm_struct with 60000+ vmas loops > with potentially allocating memory, with mm->mmap_sem held for write by > the current thread. Unless I overlooked that fatal_signal_pending() is > somewhere in the loop, this is bad if current thread was selected as an > OOM victim, for the current thread will continue allocations using memory > reserves while the OOM reaper is unable to reclaim memory. All of which implies to me that this patch fixes a problem which is not known to exist! > But there is no point with continuing the loop from the beginning if > current thread is killed. If there were __GFP_KILLABLE (or something > like memalloc_nofs_save()/memalloc_nofs_restore()), we could apply it > to all allocations inside the loop. But since we don't have such flag, > this patch uses fatal_signal_pending() check inside the loop. Dumb question: if a thread has been oom-killed and then tries to allocate memory, should the page allocator just fail the allocation attempt? I suppose there are all sorts of reasons why not :( In which case, yes, setting a new PF_MEMALLOC_MAY_FAIL_IF_I_WAS_OOMKILLED around such code might be a tidy enough solution. It would be a bit sad to add another test in the hot path (should_fail_alloc_page()?), but geeze we do a lot of junk already. > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -440,6 +440,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, > continue; > } > charge = 0; > + if (fatal_signal_pending(current)) { > + retval = -EINTR; > + goto out; > + } > if (mpnt->vm_flags & VM_ACCOUNT) { > unsigned long len = vma_pages(mpnt); I think a comment explaining why we're doing this would help. Better would be to add a new function "current_is_oom_killed()" or such, which becomes self-documenting. Because there are other reasons why a task may have a fatal signal pending.