On Thu, 17 Nov 2011, Minchan Kim wrote: > >> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > >> > index fdd4263..01aa9b5 100644 > >> > --- a/kernel/power/suspend.c > >> > +++ b/kernel/power/suspend.c > >> > @@ -297,9 +297,11 @@ int enter_state(suspend_state_t state) > >> > goto Finish; > >> > > >> > pr_debug("PM: Entering %s sleep\n", pm_states[state]); > >> > + oom_killer_disable(); > >> > pm_restrict_gfp_mask(); > >> > error = suspend_devices_and_enter(state); > >> > pm_restore_gfp_mask(); > >> > + oom_killer_enable(); > >> > > >> > Finish: > >> > pr_debug("PM: Finishing wakeup.\n"); > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> > index 6e8ecb6..d8c31b7 100644 > >> > --- a/mm/page_alloc.c > >> > +++ b/mm/page_alloc.c > >> > @@ -2177,9 +2177,9 @@ rebalance: > >> > * running out of options and have to consider going OOM > >> > */ > >> > if (!did_some_progress) { > >> > - if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) { > >> > - if (oom_killer_disabled) > >> > + if (oom_killer_disabled) > >> > goto nopage; > > > > You're allowing __GFP_NOFAIL allocations to fail. > > > >> > + if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) { > >> > page = __alloc_pages_may_oom(gfp_mask, order, > >> > zonelist, high_zoneidx, > >> > nodemask, preferred_zone, > >> > > >> > >> I'd prefer something like this. The whole 'gfp_allowed_flags' thing was > >> designed to make GFP_KERNEL work during boot time where it's obviously safe to > >> do that. I really don't think that's going to work suspend cleanly. > >> > > > > Adding Rafael to the cc. > > > > This has been done since 2.6.34 and presumably has been working quite > > well. I don't have a specific objection to gfp_allowed_flags to be used > > outside of boot since it seems plausible that there are system-level > > contexts that would need different behavior in the page allocator and this > > does it effectively without major surgery or a slower fastpath. Suspend > > is using it just like boot does before irqs are enabled, so I don't have > > an objection to it. > > > > My point isn't using gfp_allowed_flags(maybe it's Pekka's concern) but > why adding new special case handling code like pm_suspended_storage. > I think we can handle the issue with oom_killer_disabled(but the naming is bad) > Ignore the name of the function that Mel is introducing, it's only related to suspend because that's the only thing that (currently) alters gfp_allowed_mask after boot. If something else were to clear __GFP_FS and __GFP_IO in the future, we'd simply need to rename the function. I was going to ask for a comment specifically about that, but I think it's proximity to the function that allows gfp_allowed_mask to be altered is sufficient. We'd really like to avoid the loop if __GFP_FS and __GFP_IO are not set. If oom_killer_disabled is set anytime that gfp_allowed_mask does not allow them for _all_ page allocations, then we could certainly replace the new pm_suspended_storage() check in should_alloc_retry() with a check on oom_killer_disabled. I'll let you and Rafael work out whether that can be done or not, I just see pm_restrict_gfp_mask() being called in kernel/power/hibernate.c and kernel/power/suspend.c whereas oom_killer_disable() is only called in kernel/power/process.c. oom_killer_disable() would be unnecessary if __GFP_FS were already cleared, so it would require some changes in the suspend code. I like Mel's patch because it's easily maintainable and only depends on the state of reclaim and the gfp flags being passed in, which is the direct cause of the infinite loop.