On Tue 04-06-13 09:58:40, Johannes Weiner wrote: > On Tue, Jun 04, 2013 at 03:03:36PM +0200, Michal Hocko wrote: > > On Mon 03-06-13 17:44:37, Tejun Heo wrote: > > [...] > > > @@ -1218,9 +1218,18 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > > > * is alive. > > > */ > > > dead_count = atomic_read(&root->dead_count); > > > - smp_rmb(); > > > + > > > last_visited = iter->last_visited; > > > if (last_visited) { > > > + /* > > > + * Paired with smp_wmb() below in this > > > + * function. The pair guarantee that > > > + * last_visited is more current than > > > + * last_dead_count, which may lead to > > > + * spurious iteration resets but guarantees > > > + * reliable detection of dead condition. > > > + */ > > > + smp_rmb(); > > > if ((dead_count != iter->last_dead_count) || > > > !css_tryget(&last_visited->css)) { > > > last_visited = NULL; > > > > I originally had the barrier this way but Johannes pointed out it is not > > correct https://lkml.org/lkml/2013/2/11/411 > > " > > !> + /* > > !> + * last_visited might be invalid if some of the group > > !> + * downwards was removed. As we do not know which one > > !> + * disappeared we have to start all over again from the > > !> + * root. > > !> + * css ref count then makes sure that css won't > > !> + * disappear while we iterate to the next memcg > > !> + */ > > !> + last_visited = iter->last_visited; > > !> + dead_count = atomic_read(&root->dead_count); > > !> + smp_rmb(); > > ! > > !Confused about this barrier, see below. > > ! > > !As per above, if you remove the iter lock, those lines are mixed up. > > !You need to read the dead count first because the writer updates the > > !dead count after it sets the new position. That way, if the dead > > !count gives the go-ahead, you KNOW that the position cache is valid, > > !because it has been updated first. If either the two reads or the two > > !writes get reordered, you risk seeing a matching dead count while the > > !position cache is stale. > > " > > The original prototype code I sent looked like this: > > mem_cgroup_iter: > rcu_read_lock() > if atomic_read(&root->dead_count) == iter->dead_count: > smp_rmb() > if tryget(iter->position): > position = iter->position > memcg = find_next(postion) > css_put(position) > iter->position = memcg > smp_wmb() /* Write position cache BEFORE marking it uptodate */ > iter->dead_count = atomic_read(&root->dead_count) > rcu_read_unlock() > > iter->last_position is written, THEN iter->last_dead_count is written. > > So, yes, you "need to read the dead count" first to be sure > iter->last_position is uptodate. But iter->last_dead_count, not > root->dead_count. I should have caught this in the final submission > of your patch :( OK, right you are. I managed to confuse myself by the three dependencies here. dead_count -> last_visited -> last_dead_count. The first one is invalid because last_visited doesn't care about dead_count and that makes it much more clear now. > Tejun's patch is not correct, either. Something like this? Yes this looks saner and correct. Care to send a full patch? > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 010d6c1..92830fa 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > > mz = mem_cgroup_zoneinfo(root, nid, zid); > iter = &mz->reclaim_iter[reclaim->priority]; > - last_visited = iter->last_visited; > if (prev && reclaim->generation != iter->generation) { > iter->last_visited = NULL; > goto out_unlock; > @@ -1217,14 +1216,20 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root, > * css_tryget() verifies the cgroup pointed to > * is alive. > */ > + last_visited = NULL; > dead_count = atomic_read(&root->dead_count); > - smp_rmb(); > - last_visited = iter->last_visited; > - if (last_visited) { > - if ((dead_count != iter->last_dead_count) || > - !css_tryget(&last_visited->css)) { > + if (dead_count == iter->last_dead_count) { > + /* > + * The writer below sets the position > + * pointer, then the dead count. > + * Ensure we read the updated position > + * when the dead count matches. > + */ > + smp_rmb(); > + last_visited = iter->last_visited; > + if (last_visited && > + !css_tryget(&last_visited->css)) > last_visited = NULL; > - } > } > } > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>