On Fri 07-10-16 14:45:35, Minchan Kim wrote: > After fixing the race of highatomic page count, I still encounter > OOM with many free memory reserved as highatomic. > > One of reason in my testing was we unreserve free pages only if > reclaim has progress. Otherwise, we cannot have chance to unreseve. > > Other problem after fixing it was it doesn't guarantee every pages > unreserving of highatomic pageblock because it just release *a* > pageblock which could have few free pages so other context could > steal it easily so that the process stucked with direct reclaim > finally can encounter OOM although there are free pages which can > be unreserved. > > This patch changes the logic so that it unreserves pageblocks with > no_progress_loop proportionally. IOW, in first retrial of reclaim, > it will try to unreserve a pageblock. In second retrial of reclaim, > it will try to unreserve 1/MAX_RECLAIM_RETRIES * reserved_pageblock > and finally all reserved pageblock before the OOM. > > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> > --- > mm/page_alloc.c | 57 ++++++++++++++++++++++++++++++++++++++++++++------------- > 1 file changed, 44 insertions(+), 13 deletions(-) This sounds much more complex then it needs to be IMHO. Why something as simple as thhe following wouldn't work? Please note that I even didn't try to compile this. It is just give you an idea. --- mm/page_alloc.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 73f60ad6315f..e575a4f38555 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2056,7 +2056,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone, * intense memory pressure but failed atomic allocations should be easier * to recover from than an OOM. */ -static void unreserve_highatomic_pageblock(const struct alloc_context *ac) +static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, + bool force) { struct zonelist *zonelist = ac->zonelist; unsigned long flags; @@ -2067,8 +2068,14 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac) for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx, ac->nodemask) { - /* Preserve at least one pageblock */ - if (zone->nr_reserved_highatomic <= pageblock_nr_pages) + if (!zone->nr_reserved_highatomic) + continue; + + /* + * Preserve at least one pageblock unless we are really running + * out of memory + */ + if (!force && zone->nr_reserved_highatomic <= pageblock_nr_pages) continue; spin_lock_irqsave(&zone->lock, flags); @@ -2102,10 +2109,12 @@ static void unreserve_highatomic_pageblock(const struct alloc_context *ac) set_pageblock_migratetype(page, ac->migratetype); move_freepages_block(zone, page, ac->migratetype); spin_unlock_irqrestore(&zone->lock, flags); - return; + return true; } spin_unlock_irqrestore(&zone->lock, flags); } + + return false; } /* Remove an element from the buddy allocator from the fallback list */ @@ -3302,7 +3311,7 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order, * Shrink them them and try again */ if (!page && !drained) { - unreserve_highatomic_pageblock(ac); + unreserve_highatomic_pageblock(ac, false); drain_all_pages(NULL); drained = true; goto retry; @@ -3418,9 +3427,14 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, /* * Make sure we converge to OOM if we cannot make any progress * several times in the row. + * Do last desparate attempt to throw high atomic reserves away + * before we give up */ - if (*no_progress_loops > MAX_RECLAIM_RETRIES) + if (*no_progress_loops > MAX_RECLAIM_RETRIES) { + if (unreserve_highatomic_pageblock(ac, true)) + return true; return false; + } /* * Keep reclaiming pages while there is a chance this will lead -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>