Dear Christoph Thanks a lot for your comment. When this issue happen I just trigger a kernel panic and got the kdump. >From the kdump, I got the global variable pg_data_t congit_page_data. From this structure, I can see in normal zone, only order-0's nr_free = 18442, order-1's nr_free = 367, all the other order's nr_free is 0. Thanks! Best Regards Lisa Du -----Original Message----- From: Christoph Lameter [mailto:cl@xxxxxxxxx] Sent: 2013年7月24日 4:29 To: Lisa Du Cc: linux-mm@xxxxxxxxx; Mel Gorman Subject: Re: Possible deadloop in direct reclaim? On Mon, 22 Jul 2013, Lisa Du wrote: > Currently I met a possible deadloop in direct reclaim. After run plenty of the application, system run into a status that system memory is very fragmentized. Like only order-0 and order-1 memory left. Can you verify that by doing a cat /proc/buddyinfo ? > Then one process required a order-2 buffer but it enter an endless > direct reclaim. From my trace log, I can see this loop already over > 200,000 times. Kswapd was first wake up and then go back to sleep as it > cannot rebalance this order's memory. But zone->all_unreclaimable > remains 1. Though direct_reclaim every time returns no pages, but as > zone->all_unreclaimable = 1, so it loop again and again. Even when > zone->pages_scanned also becomes very large. It will block the process > for long time, until some watchdog thread detect this and kill this > process. Though it's in __alloc_pages_slowpath, but it's too slow right? > Maybe cost over 50 seconds or even more. > I think it's not as expected right? Can we also add below check in the > function all_unreclaimable() to terminate this loop? > > @@ -2355,6 +2355,8 @@ static bool all_unreclaimable(struct zonelist *zonelist, > continue; > if (!zone->all_unreclaimable) > return false; > + if (sc->nr_reclaimed == 0 && !zone_reclaimable(zone)) > + return true; > } Mel? ?韬{.n???檩jg???a?旃???)钋???骅w+h?璀?y/i?⒏??⒎???Щ??m???)钋???痂?^??觥??ザ?v???O璁?f??i?⒏?