On Sat, Dec 12, 2015 at 10:18:55PM +0300, Vladimir Davydov wrote: > On Sat, Dec 12, 2015 at 11:45:40AM -0500, Johannes Weiner wrote: > > @@ -2425,21 +2425,6 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc, > > sc->nr_scanned - scanned, > > sc->nr_reclaimed - reclaimed); > > > > - /* > > - * Direct reclaim and kswapd have to scan all memory > > - * cgroups to fulfill the overall scan target for the > > - * zone. > > - * > > - * Limit reclaim, on the other hand, only cares about > > - * nr_to_reclaim pages to be reclaimed and it will > > - * retry with decreasing priority if one round over the > > - * whole hierarchy is not sufficient. > > - */ > > - if (!global_reclaim(sc) && > > - sc->nr_reclaimed >= sc->nr_to_reclaim) { > > - mem_cgroup_iter_break(root, memcg); > > - break; > > - } > > Dunno. I like it, because it's simple and clean, but I'm unsure: can't > it result in lags when performing memcg reclaim for deep hierarchies? > For global reclaim we have kswapd, which tries to keep the system within > bounds so as to avoid direct reclaim at all. Memcg lacks such thing, and > interleave walks looks like a good compensation for it. > > Alternatively, we could avoid taking reference to iter->position and > make use of css_released cgroup callback to invalidate reclaim > iterators. With this approach, upper level cgroups shouldn't receive > unfairly high pressure in comparison to their children. Something like > this, maybe? This is surprisingly simple, to the point where I'm asking myself if I miss something in this patch or if I missed something when I did weak references the last time. But I think the last time we didn't want to go through all iterator positions like we do here. It doesn't really matter, though, that's even performed from a work item. > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 87af26a24491..fcc5133210a0 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -859,14 +859,12 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > if (prev && reclaim->generation != iter->generation) > goto out_unlock; > > - do { > + while (1) { > pos = READ_ONCE(iter->position); > - /* > - * A racing update may change the position and > - * put the last reference, hence css_tryget(), > - * or retry to see the updated position. > - */ > - } while (pos && !css_tryget(&pos->css)); > + if (!pos || css_tryget(&pos->css)) > + break; > + cmpxchg(&iter->position, pos, NULL); > + } This cmpxchg() looks a little strange. Once tryget fails, the iterator should be clear soon enough, no? If not, a comment would be good here. > @@ -912,12 +910,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > } > > if (reclaim) { > - if (cmpxchg(&iter->position, pos, memcg) == pos) { > - if (memcg) > - css_get(&memcg->css); > - if (pos) > - css_put(&pos->css); > - } > + cmpxchg(&iter->position, pos, memcg); This looks correct. The next iteration or break will put the memcg, potentially free it, which will clear it from the iterator and then rcu-free the css. Anybody who sees a pointer set under the RCU lock can safely run css_tryget() against it. Awesome! Care to resend this with changelog? -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html