Hello, Hugh. On Wed, Feb 12, 2014 at 05:29:09PM -0800, Hugh Dickins wrote: > Commit d8ad30559715 ("mm/memcg: iteration skip memcgs not yet fully > initialized") is not bad, but Greg Thelen asks "Are barriers needed?" > > Yes, I'm afraid so: this makes it a little heavier than the original, > but there's no point in guaranteeing that mem_cgroup_iter() returns only > fully initialized memcgs, if we don't guarantee that the initialization > is visible. > > If we move online_css()'s setting CSS_ONLINE after rcu_assign_pointer() > (I don't see why not), we can reasonably rely on the smp_wmb() in that. > But I can't find a pre-existing barrier at the mem_cgroup_iter() end, > so add an smp_rmb() where __mem_cgroup_iter_next() returns non-NULL. Hmmm.... so, CSS_ONLINE was never meant to be used outside cgroup proper. The only guarantee that the css iterators make is that a css which has finished its ->css_online() will be included in the iteration, which implies that css's which haven't finished ->css_online() or already went past ->css_offline() may be included in the iteration. In fact, it's impossible to achieve the guarantee without such implications if we want to avoid synchronizing everything using common locking, which we apparently can't do across different controllers. The expectation is that if a controller needs to distinguish fully online css's, it will perform its own synchronization among its online, offline and iterations, which can usually be achieved through per-css synchronization. There is asymmetry here due to the way css_tryget() behaves. Unfortuantely, I don't think it can be expanded to become symmetrical for online testing without adding, say, ->css_post_online() callback. So, the only thing that memcg can depend on while iterating is that it will include all css's which finished ->css_online() and if memcg wants to filter out the ones which haven't yet, it should do its own marking in ->css_online() rather than depending on what cgroup core does with the flags. That way, locking rules are a lot more evident in each subsystem and we don't end up depending on cgroup internal details which aren't immediately obvious. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>