On Wed, 4 Dec 2013, Johannes Weiner wrote: > However, the GFP_NOFS | __GFP_NOFAIL task stuck in the page allocator > may hold filesystem locks that could prevent a third party from > freeing memory and/or exiting, so we can not guarantee that only the > __GFP_NOFAIL task is getting stuck, it might well trap other tasks. > The same applies to open-coded GFP_NOFS allocation loops of course > unless they cycle the filesystem locks while looping. > Yup. I think we should do this: diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2631,6 +2631,11 @@ rebalance: pages_reclaimed)) { /* Wait for some write requests to complete then retry */ wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50); + + /* Allocations that cannot fail must allocate from somewhere */ + if (gfp_mask & __GFP_NOFAIL) + alloc_flags |= ALLOC_HARDER; + goto rebalance; } else { /* so that it gets the same behavior as GFP_ATOMIC and is allowed to allocate from memory reserves (although not enough to totally deplete memory). We need to leave some memory reserves around in case another process with __GFP_FS invokes the oom killer and the victim needs memory to exit since the GFP_NOFS | __GFP_NOFAIL failure wasn't only because reclaim was limited due to !__GFP_FS. The only downside of this is that it might become harder in the future to ever make a case to remove __GFP_NOFAIL entirely since the behavior of the page allocator is changed with this and it's not equivalent to coding the retry directly in the caller. On a tangent, GFP_NOWAIT | __GFP_NOFAIL and GFP_ATOMIC | __GFP_NOFAIL actually allows allocations to fail. Nothing currently does that, but I wonder if we should do this for correctness: diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2535,17 +2535,19 @@ rebalance: } } - /* Atomic allocations - we can't balance anything */ - if (!wait) - goto nopage; - - /* Avoid recursion of direct reclaim */ - if (current->flags & PF_MEMALLOC) - goto nopage; - - /* Avoid allocations with no watermarks from looping endlessly */ - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL)) - goto nopage; + if (likely(!(gfp_mask & __GFP_NOFAIL))) { + /* Atomic allocations - we can't balance anything */ + if (!wait) + goto nopage; + + /* Avoid recursion of direct reclaim */ + if (current->flags & PF_MEMALLOC) + goto nopage; + + /* Avoid allocations without watermarks from looping forever */ + if (test_thread_flag(TIF_MEMDIE)) + goto nopage; + } /* * Try direct compaction. The first pass is asynchronous. Subsequent It can be likely() because the __GFP_NOFAIL restart from the first patch above will likely now succeed since there's access to memory reserves and we never actually get here but once for __GFP_NOFAIL. Thoughts on either patch? -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html