On Wed, May 18, 2016 at 09:51:58AM +0200, Vlastimil Babka wrote: > On 05/17/2016 08:41 AM, Naoya Horiguchi wrote: > >> @@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone, > >> struct list_head *list; > >> > >> local_irq_save(flags); > >> - pcp = &this_cpu_ptr(zone->pageset)->pcp; > >> - list = &pcp->lists[migratetype]; > >> - if (list_empty(list)) { > >> - pcp->count += rmqueue_bulk(zone, 0, > >> - pcp->batch, list, > >> - migratetype, cold); > >> - if (unlikely(list_empty(list))) > >> - goto failed; > >> - } > >> + do { > >> + pcp = &this_cpu_ptr(zone->pageset)->pcp; > >> + list = &pcp->lists[migratetype]; > >> + if (list_empty(list)) { > >> + pcp->count += rmqueue_bulk(zone, 0, > >> + pcp->batch, list, > >> + migratetype, cold); > >> + if (unlikely(list_empty(list))) > >> + goto failed; > >> + } > >> > >> - if (cold) > >> - page = list_last_entry(list, struct page, lru); > >> - else > >> - page = list_first_entry(list, struct page, lru); > >> + if (cold) > >> + page = list_last_entry(list, struct page, lru); > >> + else > >> + page = list_first_entry(list, struct page, lru); > >> + } while (page && check_new_pcp(page)); > > > > This causes infinite loop when check_new_pcp() returns 1, because the bad > > page is still in the list (I assume that a bad page never disappears). > > The original kernel is free from this problem because we do retry after > > list_del(). So moving the following 3 lines into this do-while block solves > > the problem? > > > > __dec_zone_state(zone, NR_ALLOC_BATCH); > > list_del(&page->lru); > > pcp->count--; > > > > There seems no infinit loop issue in order > 0 block below, because bad pages > > are deleted from free list in __rmqueue_smallest(). > > Ooops, thanks for catching this, wish it was sooner... > Still not too late fortunately! Thanks Naoya for identifying this and Vlastimil for fixing it. > ----8<---- > From f52f5e2a7dd65f2814183d8fd254ace43120b828 Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka <vbabka@xxxxxxx> > Date: Wed, 18 May 2016 09:41:01 +0200 > Subject: [PATCH] mm, page_alloc: prevent infinite loop in buffered_rmqueue() > > In DEBUG_VM kernel, we can hit infinite loop for order == 0 in > buffered_rmqueue() when check_new_pcp() returns 1, because the bad page is > never removed from the pcp list. Fix this by removing the page before retrying. > Also we don't need to check if page is non-NULL, because we simply grab it from > the list which was just tested for being non-empty. > > Fixes: http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-freed-pages-until-a-pcp-drain.patch > Reported-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> Reviewed-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>