Re: __GFP_NOFAIL and oom_killer_disabled?

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Wed, 25 Feb 2015 20:22:22 +0900

Michal Hocko wrote:
> This commit hasn't introduced any behavior changes. GFP_NOFAIL
> allocations fail when OOM killer is disabled since beginning
> 7f33d49a2ed5 (mm, PM/Freezer: Disable OOM killer when tasks are frozen).

I thought that

-       out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false);
-       *did_some_progress = 1;
+       if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false))
+               *did_some_progress = 1;

in commit c32b3cbe0d067a9c "oom, PM: make OOM detection in the freezer
path raceless" introduced a code path which fails to set
*did_some_progress to non 0 value.

> "
> We haven't seen any bug reports since 2009 so I haven't marked the patch
> for stable. I have no problem to backport it to stable trees though if
> people think it is a good precaution.
> "

Until 3.18, GFP_NOFAIL for GFP_NOFS / GFP_NOIO did not fail with
oom_killer_disabled == true because of

----------
        if (!did_some_progress) {
                if (oom_gfp_allowed(gfp_mask)) {
                        if (oom_killer_disabled)
                                goto nopage;
			(...snipped...)
                        goto restart;
                }
        }
	(...snipped...)
	goto rebalance;
----------

and that might be the reason you did not see bug reports.
In 3.19, GFP_NOFAIL for GFP_NOFS / GFP_NOIO started to fail with
oom_killer_disabled == true because of

----------
        if (should_alloc_retry(gfp_mask, order, did_some_progress,
                                                pages_reclaimed)) {
                /*
                 * If we fail to make progress by freeing individual
                 * pages, but the allocation wants us to keep going,
                 * start OOM killing tasks.
                 */
                if (!did_some_progress) {
                        page = __alloc_pages_may_oom(gfp_mask, order, zonelist,
                                                high_zoneidx, nodemask,
                                                preferred_zone, classzone_idx,
                                                migratetype,&did_some_progress);
                        if (page)
                                goto got_pg;
                        if (!did_some_progress)
                                goto nopage;
                }
                /* Wait for some write requests to complete then retry */
                wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
                goto retry;
	} else
----------

----------
static inline struct page *
__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
        struct zonelist *zonelist, enum zone_type high_zoneidx,
        nodemask_t *nodemask, struct zone *preferred_zone,
        int classzone_idx, int migratetype, unsigned long *did_some_progress)
{
        struct page *page;

        *did_some_progress = 0;

        if (oom_killer_disabled)
                return NULL;
----------

and thus you might start seeing bug reports.

So, it is commit 9879de7373fc "mm: page_alloc: embed OOM killing naturally
into allocation slowpath" than commit c32b3cbe0d067a9c "oom, PM: make OOM
detection in the freezer path raceless" that introduced behavior changes?

> On Mon 23-02-15 22:03:25, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > What about something like the following?
> > 
> > I'm fine with whatever approaches as long as retry is guaranteed.
> > 
> > But maybe we can use memory reserves like below?
> 
> This sounds too risky to me and not really necessary. GFP_NOFAIL
> allocations shouldn't be called while the system is not running any
> tasks (aka from pm/device code). So we are primarily trying to help
> those nofail allocations which come from kernel threads and their retry
> will fail the suspend rather than blow up because of an unexpected
> allocation failure.

I meant "After all, don't we need to recheck after setting
oom_killer_disabled to true?" as "their retry will fail the suspend".

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>