On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote: > > On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote: > > > > > > > > I'll add this check to __alloc_pages_may_oom() for the !(gfp_mask & > > > > > > __GFP_NOFAIL) path since we're all content with endlessly looping. > > > > > > > > > > Thanks. Yes endlessly looping is far preferable to randomly oopsing > > > > > or corrupting memory. > > > > > > > > > > > > > Here's the new patch for your consideration. > > > > > > > > > > Then, can we take kdump in this endlessly looping situaton ? > > > > > > panic_on_oom=always + kdump can do that. > > > > > > > The endless loop is only helpful if something is going to free memory > > external to the current page allocation: either another task with > > __GFP_WAIT | __GFP_FS that invokes the oom killer, a task that frees > > memory, or a task that exits. > > > > The most notable endless loop in the page allocator is the one when a task > > has been oom killed, gets access to memory reserves, and then cannot find > > a page for a __GFP_NOFAIL allocation: > > > > do { > > page = get_page_from_freelist(gfp_mask, nodemask, order, > > zonelist, high_zoneidx, ALLOC_NO_WATERMARKS, > > preferred_zone, migratetype); > > > > if (!page && gfp_mask & __GFP_NOFAIL) > > congestion_wait(BLK_RW_ASYNC, HZ/50); > > } while (!page && (gfp_mask & __GFP_NOFAIL)); > > > > We don't expect any such allocations to happen during the exit path, but > > we could probably find some in the fs layer. > > > > I don't want to check sysctl_panic_on_oom in the page allocator because it > > would start panicking the machine unnecessarily for the integrity > > metadata GFP_NOIO | __GFP_NOFAIL allocation, for any > > order > PAGE_ALLOC_COSTLY_ORDER, or for users who can't lock the zonelist > > for oom kill that wouldn't have panicked before. > > > > Then, why don't you check higzone_idx in oom_kill.c > out_of_memory() doesn't return a value to specify whether the page allocator should retry the allocation or just return NULL, all that policy is kept in mm/page_alloc.c. For highzone_idx < ZONE_NORMAL, we want to fail the allocation when !(gfp_mask & __GFP_NOFAIL) and call the oom killer when it's __GFP_NOFAIL. --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1696,6 +1696,9 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, /* The OOM killer will not help higher order allocs */ if (order > PAGE_ALLOC_COSTLY_ORDER) goto out; + /* The OOM killer does not needlessly kill tasks for lowmem */ + if (high_zoneidx < ZONE_NORMAL) + goto out; /* * GFP_THISNODE contains __GFP_NORETRY and we never hit this. * Sanity check for bare calls of __GFP_THISNODE, not real OOM. @@ -1924,15 +1927,23 @@ rebalance: if (page) goto got_pg; - /* - * The OOM killer does not trigger for high-order - * ~__GFP_NOFAIL allocations so if no progress is being - * made, there are no other options and retrying is - * unlikely to help. - */ - if (order > PAGE_ALLOC_COSTLY_ORDER && - !(gfp_mask & __GFP_NOFAIL)) - goto nopage; + if (!(gfp_mask & __GFP_NOFAIL)) { + /* + * The oom killer is not called for high-order + * allocations that may fail, so if no progress + * is being made, there are no other options and + * retrying is unlikely to help. + */ + if (order > PAGE_ALLOC_COSTLY_ORDER) + goto nopage; + /* + * The oom killer is not called for lowmem + * allocations to prevent needlessly killing + * innocent tasks. + */ + if (high_zoneidx < ZONE_NORMAL) + goto nopage; + } goto restart; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>