Dear Bob Thank you so much for the careful review, Yes, it's a typo, I mean zone->all_unreclaimable = 0. You mentioned add the check in kswapd_shrink_zone(), sorry that I didn't find this function in kernel3.4 or kernel3.9. Is this function called in direct_reclaim? As I mentioned this issue happened after kswapd thread sleep, if it only called in kswapd, then I think it can't help. Thanks! Best Regards Lisa Du -----Original Message----- From: Bob Liu [mailto:lliubbo@xxxxxxxxx] Sent: 2013年7月24日 9:18 To: Lisa Du Cc: linux-mm@xxxxxxxxx; Christoph Lameter; Mel Gorman Subject: Re: Possible deadloop in direct reclaim? On Tue, Jul 23, 2013 at 12:58 PM, Lisa Du <cldu@xxxxxxxxxxx> wrote: > Dear Sir: > > Currently I met a possible deadloop in direct reclaim. After run plenty of > the application, system run into a status that system memory is very > fragmentized. Like only order-0 and order-1 memory left. > > Then one process required a order-2 buffer but it enter an endless direct > reclaim. From my trace log, I can see this loop already over 200,000 times. > Kswapd was first wake up and then go back to sleep as it cannot rebalance > this order’s memory. But zone->all_unreclaimable remains 1. > > Though direct_reclaim every time returns no pages, but as > zone->all_unreclaimable = 1, so it loop again and again. Even when > zone->pages_scanned also becomes very large. It will block the process for > long time, until some watchdog thread detect this and kill this process. > Though it’s in __alloc_pages_slowpath, but it’s too slow right? Maybe cost > over 50 seconds or even more. You must be mean zone->all_unreclaimable = 0? > > I think it’s not as expected right? Can we also add below check in the > function all_unreclaimable() to terminate this loop? > > > > @@ -2355,6 +2355,8 @@ static bool all_unreclaimable(struct zonelist > *zonelist, > > continue; > > if (!zone->all_unreclaimable) > > return false; > > + if (sc->nr_reclaimed == 0 && !zone_reclaimable(zone)) > > + return true; > How about replace the checking in kswapd_shrink_zone()? @@ -2824,7 +2824,7 @@ static bool kswapd_shrink_zone(struct zone *zone, /* Account for the number of pages attempted to reclaim */ *nr_attempted += sc->nr_to_reclaim; - if (nr_slab == 0 && !zone_reclaimable(zone)) + if (sc->nr_reclaimed == 0 && !zone_reclaimable(zone)) zone->all_unreclaimable = 1; zone_clear_flag(zone, ZONE_WRITEBACK); I think the current check is wrong, reclaimed a slab doesn't mean reclaimed a page. -- Regards, --Bob ��.n������g����a����&ޖ)���)��h���&������梷�����Ǟ�m������)������^�����������v���O��zf������