Possible deadloop in direct reclaim?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Sir:

Currently I met a possible deadloop in direct reclaim. After run plenty of the application, system run into a status that system memory is very fragmentized. Like only order-0 and order-1 memory left.

Then one process required a order-2 buffer but it enter an endless direct reclaim. From my trace log, I can see this loop already over 200,000 times. Kswapd was first wake up and then go back to sleep as it cannot rebalance this order’s memory. But zone->all_unreclaimable remains 1.

Though direct_reclaim every time returns no pages, but as zone->all_unreclaimable = 1, so it loop again and again. Even when zone->pages_scanned also becomes very large. It will block the process for long time, until some watchdog thread detect this and kill this process. Though it’s in __alloc_pages_slowpath, but it’s too slow right? Maybe cost over 50 seconds or even more.

I think it’s not as expected right?  Can we also add below check in the function all_unreclaimable() to terminate this loop?

 

@@ -2355,6 +2355,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,

                        continue;

                if (!zone->all_unreclaimable)

                        return false;

+               if (sc->nr_reclaimed == 0 && !zone_reclaimable(zone))

+                       return true;

        }

         BTW: I’m using kernel3.4, I also try to search in the kernel3.9, didn’t see a possible fix for such issue. Or is anyone also met such issue before? Any comment will be welcomed, looking forward to your reply!

 

Thanks!

 

Best Regards

Lisa Du

 


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]