RE: [resend] [PATCH V3] mm: vmscan: fix do_try_to_free_pages() livelock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>-----Original Message-----
>From: Andrew Morton [mailto:akpm@xxxxxxxxxxxxxxxxxxxx]
>Sent: 2013年8月28日 3:43
>To: Lisa Du
>Cc: Johannes Weiner; Michal Hocko; linux-mm@xxxxxxxxx; Minchan Kim; KOSAKI Motohiro; Mel Gorman; Christoph Lameter; Bob Liu;
>Neil Zhang; Russell King - ARM Linux; Aaditya Kumar; yinghan@xxxxxxxxxx; npiggin@xxxxxxxxx; riel@xxxxxxxxxx;
>kamezawa.hiroyu@xxxxxxxxxxxxxx
>Subject: Re: [resend] [PATCH V3] mm: vmscan: fix do_try_to_free_pages() livelock
>
>On Sun, 11 Aug 2013 18:46:08 -0700 Lisa Du <cldu@xxxxxxxxxxx> wrote:
>
>> This patch is based on KOSAKI's work and I add a little more
>> description, please refer https://lkml.org/lkml/2012/6/14/74.
>>
>> Currently, I found system can enter a state that there are lots of
>> free pages in a zone but only order-0 and order-1 pages which means
>> the zone is heavily fragmented, then high order allocation could make
>> direct reclaim path's long stall(ex, 60 seconds) especially in no swap
>> and no compaciton enviroment. This problem happened on v3.4, but it
>> seems issue still lives in current tree, the reason is
>> do_try_to_free_pages enter live lock:
>>
>> kswapd will go to sleep if the zones have been fully scanned and are
>> still not balanced. As kswapd thinks there's little point trying all
>> over again to avoid infinite loop. Instead it changes order from
>> high-order to 0-order because kswapd think order-0 is the most
>> important. Look at 73ce02e9 in detail. If watermarks are ok, kswapd
>> will go back to sleep and may leave zone->all_unreclaimable = 0.
>> It assume high-order users can still perform direct reclaim if they wish.
>>
>> Direct reclaim continue to reclaim for a high order which is not a
>> COSTLY_ORDER without oom-killer until kswapd turn on zone->all_unreclaimble.
>> This is because to avoid too early oom-kill. So it means
>> direct_reclaim depends on kswapd to break this loop.
>>
>> In worst case, direct-reclaim may continue to page reclaim forever
>> when kswapd sleeps forever until someone like watchdog detect and
>> finally kill the process. As described in:
>> http://thread.gmane.org/gmane.linux.kernel.mm/103737
>>
>> We can't turn on zone->all_unreclaimable from direct reclaim path
>> because direct reclaim path don't take any lock and this way is racy.
>> Thus this patch removes zone->all_unreclaimable field completely and
>> recalculates zone reclaimable state every time.
>>
>> Note: we can't take the idea that direct-reclaim see
>> zone->pages_scanned directly and kswapd continue to use
>> zone->all_unreclaimable. Because, it is racy. commit 929bea7c71
>> (vmscan: all_unreclaimable() use
>> zone->all_unreclaimable as a name) describes the detail.
>
>I did this to fix the build:
>
>--- a/mm/migrate.c~mm-vmscan-fix-do_try_to_free_pages-livelock-fix-2
>+++ a/mm/migrate.c
>@@ -1471,7 +1471,7 @@ static bool migrate_balanced_pgdat(struc
> 		if (!populated_zone(zone))
> 			continue;
>
>-		if (zone->all_unreclaimable)
>+		if (!zone_reclaimable(zone))
> 			continue;
>
> 		/* Avoid waking kswapd by allocating pages_to_migrate pages. */
>
>Please review and runtime test it?
This should be reasonable, I'm sorry that I only have the v3.4 environment.
And v3.4 doesn't have this function.
?韬{.n???檩jg???a?旃???)钋???骅w+h?璀?y/i?⒏??⒎???Щ??m???)钋???痂?^??觥??ザ?v???O璁?f??i?⒏?




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]