Possible deadloop in direct reclaim?

Lisa Du <cldu@xxxxxxxxxxx> · Mon, 22 Jul 2013 21:58:17 -0700












Dear Sir:

Currently I met
a possible deadloop in direct reclaim. After run plenty of the application,
system run into a status that system memory is very fragmentized. Like only
order-0 and order-1 memory left. 

Then one process
required a order-2 buffer but it enter an endless direct reclaim. From my trace
log, I can see this loop already over 200,000 times. Kswapd was first wake up
and then go back to sleep as it cannot rebalance this order’s memory. But
zone->all_unreclaimable remains 1.

Though
direct_reclaim every time returns no pages, but as zone->all_unreclaimable =
1, so it loop again and again. Even when zone->pages_scanned also becomes
very large. It will block the process for long time, until some watchdog thread
detect this and kill this process. Though it’s in __alloc_pages_slowpath,
but it’s too slow right? Maybe cost over 50 seconds or even more.

I think it’s
not as expected right?  Can we also add below check in the function all_unreclaimable()
to terminate this loop?

 

@@ -2355,6
+2355,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,

                       
continue;

               
if (!zone->all_unreclaimable)

                       
return false;

+              
if (sc->nr_reclaimed == 0 && !zone_reclaimable(zone))

+                      
return true;

       
}

         BTW:
I’m using kernel3.4, I also try to search in the kernel3.9, didn’t
see a possible fix for such issue. Or is anyone also met such issue before? Any
comment will be welcomed, looking forward to your reply!

 

Thanks!

 

Best Regards

Lisa Du