On Tue 13-10-15 00:25:53, Tetsuo Handa wrote: [...] > What is strange, the values printed by this debug printk() patch did not > change as time went by. Thus, I think that this is not a problem of lack of > CPU time for scanning pages. I suspect that there is a bug that nobody is > scanning pages. > > ---------- > [ 66.821450] zone_reclaimable returned 1 at line 2646 > [ 66.823020] (ACTIVE_FILE=26+INACTIVE_FILE=10) * 6 > PAGES_SCANNED=32 > [ 66.824935] shrink_zones returned 1 at line 2706 > [ 66.826392] zones_reclaimable=1 at line 2765 > [ 66.827865] do_try_to_free_pages returned 1 at line 2938 > [ 67.102322] __perform_reclaim returned 1 at line 2854 > [ 67.103968] did_some_progress=1 at line 3301 > (...snipped...) > [ 281.439977] zone_reclaimable returned 1 at line 2646 > [ 281.439977] (ACTIVE_FILE=26+INACTIVE_FILE=10) * 6 > PAGES_SCANNED=32 > [ 281.439978] shrink_zones returned 1 at line 2706 > [ 281.439978] zones_reclaimable=1 at line 2765 > [ 281.439979] do_try_to_free_pages returned 1 at line 2938 > [ 281.439979] __perform_reclaim returned 1 at line 2854 > [ 281.439980] did_some_progress=1 at line 3301 This is really interesting because even with reclaimable LRUs this low we should eventually scan them enough times to convince zone_reclaimable to fail. PAGES_SCANNED in your logs seems to be constant, though, which suggests somebody manages to free a page every time before we get down to priority 0 and manage to scan something finally. This is pretty much pathological behavior and I have hard time to imagine how would that be possible but it clearly shows that zone_reclaimable heuristic is not working properly. I can see two options here. Either we teach zone_reclaimable to be less fragile or remove zone_reclaimable from shrink_zones altogether. Both of them are risky because we have a long history of changes in this areas which made other subtle behavior changes but I guess that the first option should be less fragile. What about the following patch? I am not happy about it because the condition is rather rough and a deeper inspection is really needed to check all the call sites but it should be good for testing. --- >From afe1c5ef4726b78f51e850ed93564b52f3c73905 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxxx> Date: Tue, 13 Oct 2015 15:12:13 +0200 Subject: [PATCH] mm, vmscan: Make zone_reclaimable less fragile zone_reclaimable considers a zone unreclaimable if we have scanned all the reclaimable pages sufficient times since the last page has been freed and that still hasn't led to an allocation success. This can, however, lead to a livelock/trashing when a single freed page resets PAGES_SCANNED while memory consumers are looping over small LRUs without making any progress (e.g. remaining pages on the LRU are dirty and all the flushers are blocked) and failing to invoke the OOM killer beause zone_reclaimable would consider the zone reclaimable. Tetsuo Handa has reported the following: : [ 66.821450] zone_reclaimable returned 1 at line 2646 : [ 66.823020] (ACTIVE_FILE=26+INACTIVE_FILE=10) * 6 > PAGES_SCANNED=32 : [ 66.824935] shrink_zones returned 1 at line 2706 : [ 66.826392] zones_reclaimable=1 at line 2765 : [ 66.827865] do_try_to_free_pages returned 1 at line 2938 : [ 67.102322] __perform_reclaim returned 1 at line 2854 : [ 67.103968] did_some_progress=1 at line 3301 : (...snipped...) : [ 281.439977] zone_reclaimable returned 1 at line 2646 : [ 281.439977] (ACTIVE_FILE=26+INACTIVE_FILE=10) * 6 > PAGES_SCANNED=32 : [ 281.439978] shrink_zones returned 1 at line 2706 : [ 281.439978] zones_reclaimable=1 at line 2765 : [ 281.439979] do_try_to_free_pages returned 1 at line 2938 : [ 281.439979] __perform_reclaim returned 1 at line 2854 : [ 281.439980] did_some_progress=1 at line 3301 In his case anon LRUs are not reclaimable because there is no swap enabled. It is not clear who frees a page that regularly but it is clear that no progress can be made but zone_reclaimable still consider the zone reclaimable. This patch makes zone_reclaimable less fragile by checking the number of reclaimable pages against low watermark. It doesn't make much sense to rely on a PAGES_SCANNED heuristic if there are not enough reclaimable pages to get us over min watermark. Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> --- mm/vmscan.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index c88d74ad9304..f16266e0af70 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -209,8 +209,14 @@ static unsigned long zone_reclaimable_pages(struct zone *zone) bool zone_reclaimable(struct zone *zone) { + unsigned long reclaimable = zone_reclaimable_pages(zone); + unsigned long free = zone_page_state(zone, NR_FREE_PAGES); + + if (reclaimable + free < min_wmark_pages(zone)) + return false; + return zone_page_state(zone, NR_PAGES_SCANNED) < - zone_reclaimable_pages(zone) * 6; + reclaimable * 6; } static unsigned long get_lru_size(struct lruvec *lruvec, enum lru_list lru) -- 2.5.1 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>