Dear Sir: Currently I met
a possible deadloop in direct reclaim. After run plenty of the application,
system run into a status that system memory is very fragmentized. Like only
order-0 and order-1 memory left. Then one process
required a order-2 buffer but it enter an endless direct reclaim. From my trace
log, I can see this loop already over 200,000 times. Kswapd was first wake up
and then go back to sleep as it cannot rebalance this order’s memory. But
zone->all_unreclaimable remains 1. Though
direct_reclaim every time returns no pages, but as zone->all_unreclaimable =
1, so it loop again and again. Even when zone->pages_scanned also becomes
very large. It will block the process for long time, until some watchdog thread
detect this and kill this process. Though it’s in __alloc_pages_slowpath,
but it’s too slow right? Maybe cost over 50 seconds or even more. I think it’s
not as expected right? Can we also add below check in the function all_unreclaimable()
to terminate this loop? @@ -2355,6
+2355,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
continue;
if (!zone->all_unreclaimable)
return false; +
if (sc->nr_reclaimed == 0 && !zone_reclaimable(zone)) +
return true;
} BTW:
I’m using kernel3.4, I also try to search in the kernel3.9, didn’t
see a possible fix for such issue. Or is anyone also met such issue before? Any
comment will be welcomed, looking forward to your reply! Thanks! Best Regards Lisa Du |