On Mon, Aug 29, 2011 at 12:15 AM, Ying Han <yinghan@xxxxxxxxxx> wrote: > On Thu, Aug 11, 2011 at 2:09 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: >> >> On Thu, Aug 11, 2011 at 01:39:45PM -0700, Ying Han wrote: >> > Please consider including the following patch for the next post. It causes >> > crash on some of the tests where sc->mem_cgroup is NULL (global kswapd). >> > >> > diff --git a/mm/vmscan.c b/mm/vmscan.c >> > index b72a844..12ab25d 100644 >> > --- a/mm/vmscan.c >> > +++ b/mm/vmscan.c >> > @@ -2768,7 +2768,8 @@ loop_again: >> > * Do some background aging of the anon list, to >> > give >> > * pages a chance to be referenced before >> > reclaiming. >> > */ >> > - if (inactive_anon_is_low(zone, &sc)) >> > + if (scanning_global_lru(&sc) && >> > + inactive_anon_is_low(zone, &sc)) >> > shrink_active_list(SWAP_CLUSTER_MAX, zone, >> > &sc, priority, 0); >> >> Thanks! I completely overlooked this one and only noticed it after >> changing the arguments to shrink_active_list(). >> >> On memcg configurations, scanning_global_lru() will essentially never >> be true again, so I moved the anon pre-aging to a separate function >> that also does a hierarchy loop to preage the per-memcg anon lists. >> >> I hope to send out the next revision soon. > > Also, please consider to fold in the following patch as well. It fixes > the root cgroup lru accounting and we could easily trigger OOM while > doing some swapoff test w/o it. > > mm:fix the lru accounting for root cgroup. > > This patch is applied on top of: > " > mm: memcg-aware global reclaim > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > " > > This patch fixes the lru accounting for root cgroup. > > After the "memcg-aware global reclaim" patch, one of the changes is to have > lru pages linked back to root. Under the global memory pressure, we start from > the root cgroup lru and walk through the memcg hierarchy of the system. For > each memcg, we reclaim pages based on the its lru size. > > However for root cgroup, we used not having a seperate lru and only counting > the pages charged to root as part of root lru size. Without this patch, all > the pages which are linked to root lru but not charged to root like swapcache > readahead are not visible to page reclaim code and we are easily to get OOM. > > After this patch, all the pages linked under root lru are counted in the lru > size, including Used and !Used. > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > Signed-off-by: Ying Han <yinghan@xxxxxxxxxx> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5518f54..f6c5f29 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -888,19 +888,21 @@ void mem_cgroup_del_lru_list(struct page *page, > enum lru_list lru) > { > >------struct page_cgroup *pc; > >------struct mem_cgroup_per_zone *mz; > +>------struct mem_cgroup *mem; > · > >------if (mem_cgroup_disabled()) > >------>-------return; > >------pc = lookup_page_cgroup(page); > ->------/* can happen while we handle swapcache. */ > ->------if (!TestClearPageCgroupAcctLRU(pc)) > ->------>-------return; > ->------VM_BUG_ON(!pc->mem_cgroup); > ->------/* > ->------ * We don't check PCG_USED bit. It's cleared when the "page" is finally > ->------ * removed from global LRU. > ->------ */ > ->------mz = page_cgroup_zoneinfo(pc->mem_cgroup, page); > + > +>------if (TestClearPageCgroupAcctLRU(pc) || PageCgroupUsed(pc)) { > +>------>-------VM_BUG_ON(!pc->mem_cgroup); > +>------>-------mem = pc->mem_cgroup; > +>------} else { > +>------>-------/* can happen while we handle swapcache. */ > +>------>-------mem = root_mem_cgroup; > +>------} > + > +>------mz = page_cgroup_zoneinfo(mem, page); > >------MEM_CGROUP_ZSTAT(mz, lru) -= 1; > >------VM_BUG_ON(list_empty(&pc->lru)); > >------list_del_init(&pc->lru); > @@ -961,22 +963,31 @@ void mem_cgroup_add_lru_list(struct page *page, > enum lru_list lru) > { > >------struct page_cgroup *pc; > >------struct mem_cgroup_per_zone *mz; > +>------struct mem_cgroup *mem; > · > >------if (mem_cgroup_disabled()) > >------>-------return; > >------pc = lookup_page_cgroup(page); > >------VM_BUG_ON(PageCgroupAcctLRU(pc)); > ->------/* > ->------ * Used bit is set without atomic ops but after smp_wmb(). > ->------ * For making pc->mem_cgroup visible, insert smp_rmb() here. > ->------ */ > ->------smp_rmb(); > ->------if (!PageCgroupUsed(pc)) > ->------>-------return; > · > ->------mz = page_cgroup_zoneinfo(pc->mem_cgroup, page); > +>------if (PageCgroupUsed(pc)) { > +>------>-------/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */ > +>------>-------smp_rmb(); > +>------>-------mem = pc->mem_cgroup; > +>------>-------SetPageCgroupAcctLRU(pc); > +>------} else { > +>------>-------/* > +>------>------- * If the page is no longer charged, add it to the > +>------>------- * root memcg's lru. Either it will be freed soon, or > +>------>------- * it will get charged again and the charger will > +>------>------- * relink it to the right list. > +>------>-------mem = root_mem_cgroup; > +>------} > + > +>------mz = page_cgroup_zoneinfo(mem, page); > >------MEM_CGROUP_ZSTAT(mz, lru) += 1; > ->------SetPageCgroupAcctLRU(pc); > + > >------list_add(&pc->lru, &mz->lists[lru]); > } > > --Ying > And this patch fixes the hierarchy_walk : fix hierarchy_walk() to hold a reference to first mem_cgroup The first mem_cgroup returned from hierarchy_walk() is used to terminate a round-trip. However there is no reference hold on that which the first could be removed during the walking. The patch including the following change: 1. hold a reference on the first mem_cgroup during the walk. 2. rename the variable "root" to "target", which we found using "root" is confusing in this content with root_mem_cgroup. better naming is welcomed. Signed-off-by: Ying Han <yinghan@xxxxxxxxxx> include/linux/memcontrol.h | 10 +++++++--- mm/memcontrol.c | 41 ++++++++++++++++++++++++----------------- mm/vmscan.c | 10 +++++----- 3 files changed, 36 insertions(+), 25 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 15b713b..4de12ca 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -104,8 +104,10 @@ extern void mem_cgroup_end_migration(struct mem_cgroup *mem, >------struct page *oldpage, struct page *newpage); · struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *, +>------>------->------->------->------- struct mem_cgroup *, >------>------->------->------->------- struct mem_cgroup *); -void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *, struct mem_cgroup *); +void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *, struct mem_cgroup *, +>------>------->------->------- struct mem_cgroup *); · /* * For memory reclaim. @@ -332,13 +334,15 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page) >------return NULL; } · -static inline struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *r, +static inline struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *t, +>------>------->------->------->------->------->------- struct mem_cgroup *f, >------>------->------->------->------->------->------- struct mem_cgroup *m) { >------return NULL; } · -static inline void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *r, +static inline void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *t, +>------>------->------->------->------->------- struct mem_cgroup *f, >------>------->------->------->------->------- struct mem_cgroup *m) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f6c5f29..80b62aa 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1447,60 +1447,67 @@ u64 mem_cgroup_get_limit(struct mem_cgroup *memcg) · /** * mem_cgroup_hierarchy_walk - iterate over a memcg hierarchy - * @root: starting point of the hierarchy + * @target: starting point of the hierarchy + * @first: first node of the scanning * @prev: previous position or NULL * - * Caller must hold a reference to @root. While this function will - * return @root as part of the walk, it will never increase its + * Caller must hold a reference to @parent. While this function will + * return @parent as part of the walk, it will never increase its * reference count. * * Caller must clean up with mem_cgroup_stop_hierarchy_walk() when it * stops the walk potentially before the full round trip. */ -struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *root, +struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *target, +>------>------->------->------->------- struct mem_cgroup *first, >------>------->------->------->------- struct mem_cgroup *prev) { ->------struct mem_cgroup *mem; +>------struct mem_cgroup *mem = NULL; · >------if (mem_cgroup_disabled()) >------>-------return NULL; · ->------if (!root) ->------>-------root = root_mem_cgroup; +>------if (!target) +>------>-------target = root_mem_cgroup; >------/* >------ * Even without hierarchy explicitely enabled in the root >------ * memcg, it is the ultimate parent of all memcgs. >------ */ ->------if (!(root == root_mem_cgroup || root->use_hierarchy)) ->------>-------return root; ->------if (prev && prev != root) +>------if (!(target == root_mem_cgroup || target->use_hierarchy)) +>------>-------return target; +>------if (prev && prev != target && prev != first) >------>-------css_put(&prev->css); >------do { ->------>-------int id = root->last_scanned_child; +>------>-------int id = target->last_scanned_child; >------>-------struct cgroup_subsys_state *css; · >------>-------rcu_read_lock(); ->------>-------css = css_get_next(&mem_cgroup_subsys, id + 1, &root->css, &id); ->------>-------if (css && (css == &root->css || css_tryget(css))) +>------>-------css = css_get_next(&mem_cgroup_subsys, id + 1, &target->css, &id); +>------>-------if (css && (css == &target->css || css_tryget(css))) >------>------->-------mem = container_of(css, struct mem_cgroup, css); >------>-------rcu_read_unlock(); >------>-------if (!css) >------>------->-------id = 0; ->------>-------root->last_scanned_child = id; +>------>-------target->last_scanned_child = id; >------} while (!mem); >------return mem; } · /** * mem_cgroup_stop_hierarchy_walk - clean up after partial hierarchy walk - * @root: starting point in the hierarchy + * @target: starting point in the hierarchy + * @first: first node of the scanning * @mem: last position during the walk */ -void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *root, +void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *target, +>------>------->------->------- struct mem_cgroup *first, >------>------->------->------- struct mem_cgroup *mem) { ->------if (mem && mem != root) +>------if (mem && mem != target) >------>-------css_put(&mem->css); + +>------if (first && first != mem && first != target) +>------>-------css_put(&first->css); } · static unsigned long mem_cgroup_reclaim(struct mem_cgroup *mem, diff --git a/mm/vmscan.c b/mm/vmscan.c index 3bcb212..aee958a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1751,10 +1751,10 @@ static void shrink_zone(int priority, struct zone *zone, >------>------->-------struct scan_control *sc) { >------unsigned long nr_reclaimed_before = sc->nr_reclaimed; ->------struct mem_cgroup *root = sc->target_mem_cgroup; ->------struct mem_cgroup *first, *mem = NULL; +>------struct mem_cgroup *target = sc->target_mem_cgroup; +>------struct mem_cgroup *first, *mem; · ->------first = mem = mem_cgroup_hierarchy_walk(root, mem); +>------first = mem = mem_cgroup_hierarchy_walk(target, NULL, NULL); >------for (;;) { >------>-------unsigned long nr_reclaimed; · @@ -1765,11 +1765,11 @@ static void shrink_zone(int priority, struct zone *zone, >------>-------if (nr_reclaimed >= sc->nr_to_reclaim) >------>------->-------break; · ->------>-------mem = mem_cgroup_hierarchy_walk(root, mem); +>------>-------mem = mem_cgroup_hierarchy_walk(target, first, mem); >------>-------if (mem == first) >------>------->-------break; >------} ->------mem_cgroup_stop_hierarchy_walk(root, mem); +>------mem_cgroup_stop_hierarchy_walk(target, first, mem); } --Ying -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href