On Tue, Jun 19, 2012 at 5:05 AM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > On Mon, Jun 18, 2012 at 09:47:31AM -0700, Ying Han wrote: >> The function zone_reclaimable() marks zone->all_unreclaimable based on >> per-zone pages_scanned and reclaimable_pages. If all_unreclaimable is true, >> alloc_pages could go to OOM instead of getting stuck in page reclaim. > > There is no zone->all_unreclaimable at this point, you removed it in > the previous patch. Ah, forgot to update the commit log after applying the recent patch from Kosaki. >> In memcg kernel, cgroup under its softlimit is not targeted under global >> reclaim. So we need to remove those pages from reclaimable_pages, otherwise >> it will cause reclaim mechanism to get stuck trying to reclaim from >> all_unreclaimable zone. > > Can't you check if zone->pages_scanned changed in between reclaim > runs? > > Or sum up the scanned and reclaimable pages encountered while > iterating the hierarchy during regular reclaim and then use those > numbers in the equation instead of the per-zone counters? > > Walking the full global hierarchy in all the places where we check if > a zone is reclaimable is a scalability nightmare. I agree on that, i will exploring a bit more on that. > >> @@ -100,18 +100,36 @@ static __always_inline enum lru_list page_lru(struct page *page) >> return lru; >> } >> >> +static inline unsigned long get_lru_size(struct lruvec *lruvec, >> + enum lru_list lru) >> +{ >> + if (!mem_cgroup_disabled()) >> + return mem_cgroup_get_lru_size(lruvec, lru); >> + >> + return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru); >> +} >> + >> static inline unsigned long zone_reclaimable_pages(struct zone *zone) >> { >> - int nr; >> + int nr = 0; >> + struct mem_cgroup *memcg; >> + >> + memcg = mem_cgroup_iter(NULL, NULL, NULL); >> + do { >> + struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg); >> >> - nr = zone_page_state(zone, NR_ACTIVE_FILE) + >> - zone_page_state(zone, NR_INACTIVE_FILE); >> + if (should_reclaim_mem_cgroup(memcg)) { >> + nr += get_lru_size(lruvec, LRU_INACTIVE_FILE) + >> + get_lru_size(lruvec, LRU_ACTIVE_FILE); > > Sometimes, the number of reclaimable pages DO include those of groups > for which should_reclaim_mem_cgroup() is false: when the priority > level is <= DEF_PRIORITY - 2, as you defined in 1/5! This means that > you consider pages you just scanned unreclaimable, which can result in > the zone being unreclaimable after the DEF_PRIORITY - 2 cycle, no? That is true and I thought about it as well. I would as well adding the priority check here where only start considering the pages if the priority < DEF_PRIORITY - 2 --Ying -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href