Tetsuo Handa wrote: > Johannes Weiner wrote: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 8e20f9c2fa5a..f77c58ebbcfa 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2382,8 +2382,15 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > > if (high_zoneidx < ZONE_NORMAL) > > goto out; > > /* The OOM killer does not compensate for light reclaim */ > > - if (!(gfp_mask & __GFP_FS)) > > + if (!(gfp_mask & __GFP_FS)) { > > + /* > > + * XXX: Page reclaim didn't yield anything, > > + * and the OOM killer can't be invoked, but > > + * keep looping as per should_alloc_retry(). > > + */ > > + *did_some_progress = 1; > > goto out; > > + } > > Why do you omit out_of_memory() call for GFP_NOIO / GFP_NOFS allocations? I can see "possible memory allocation deadlock in %s (mode:0x%x)" warnings at kmem_alloc() in fs/xfs/kmem.c . I think commit 9879de7373fcfb46 "mm: page_alloc: embed OOM killing naturally into allocation slowpath" introduced a regression and below one is the fix. --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2381,9 +2381,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, /* The OOM killer does not needlessly kill tasks for lowmem */ if (high_zoneidx < ZONE_NORMAL) goto out; - /* The OOM killer does not compensate for light reclaim */ - if (!(gfp_mask & __GFP_FS)) - goto out; /* * GFP_THISNODE contains __GFP_NORETRY and we never hit this. * Sanity check for bare calls of __GFP_THISNODE, not real OOM. BTW, I think commit c32b3cbe0d067a9c "oom, PM: make OOM detection in the freezer path raceless" opened a race window for __alloc_pages_may_oom(__GFP_NOFAIL) allocation to fail when OOM killer is disabled. I think something like --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -789,7 +789,7 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, bool ret = false; down_read(&oom_sem); - if (!oom_killer_disabled) { + if (!oom_killer_disabled || (gfp_mask & __GFP_NOFAIL)) { __out_of_memory(zonelist, gfp_mask, order, nodemask, force_kill); ret = true; } is needed. But such change can race with up_write() and wait_event() in oom_killer_disable(). While the comment of oom_killer_disable() says "The function cannot be called when there are runnable user tasks because the userspace would see unexpected allocation failures as a result.", aren't there still kernel threads which might do __GFP_NOFAIL allocations? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>