Re: [PATCH 1/3] memcg: fix subtle memory barrier bug in mem_cgroup_iter()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 04-06-13 09:58:40, Johannes Weiner wrote:
> On Tue, Jun 04, 2013 at 03:03:36PM +0200, Michal Hocko wrote:
> > On Mon 03-06-13 17:44:37, Tejun Heo wrote:
> > [...]
> > > @@ -1218,9 +1218,18 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
> > >  			 * is alive.
> > >  			 */
> > >  			dead_count = atomic_read(&root->dead_count);
> > > -			smp_rmb();
> > > +
> > >  			last_visited = iter->last_visited;
> > >  			if (last_visited) {
> > > +				/*
> > > +				 * Paired with smp_wmb() below in this
> > > +				 * function.  The pair guarantee that
> > > +				 * last_visited is more current than
> > > +				 * last_dead_count, which may lead to
> > > +				 * spurious iteration resets but guarantees
> > > +				 * reliable detection of dead condition.
> > > +				 */
> > > +				smp_rmb();
> > >  				if ((dead_count != iter->last_dead_count) ||
> > >  					!css_tryget(&last_visited->css)) {
> > >  					last_visited = NULL;
> > 
> > I originally had the barrier this way but Johannes pointed out it is not
> > correct https://lkml.org/lkml/2013/2/11/411
> > "
> > !> +			/*
> > !> +			 * last_visited might be invalid if some of the group
> > !> +			 * downwards was removed. As we do not know which one
> > !> +			 * disappeared we have to start all over again from the
> > !> +			 * root.
> > !> +			 * css ref count then makes sure that css won't
> > !> +			 * disappear while we iterate to the next memcg
> > !> +			 */
> > !> +			last_visited = iter->last_visited;
> > !> +			dead_count = atomic_read(&root->dead_count);
> > !> +			smp_rmb();
> > !
> > !Confused about this barrier, see below.
> > !
> > !As per above, if you remove the iter lock, those lines are mixed up.
> > !You need to read the dead count first because the writer updates the
> > !dead count after it sets the new position.  That way, if the dead
> > !count gives the go-ahead, you KNOW that the position cache is valid,
> > !because it has been updated first.  If either the two reads or the two
> > !writes get reordered, you risk seeing a matching dead count while the
> > !position cache is stale.
> > "
> 
> The original prototype code I sent looked like this:
> 
> mem_cgroup_iter:
> rcu_read_lock()
> if atomic_read(&root->dead_count) == iter->dead_count:
>   smp_rmb()
>   if tryget(iter->position):
>     position = iter->position
> memcg = find_next(postion)
> css_put(position)
> iter->position = memcg
> smp_wmb() /* Write position cache BEFORE marking it uptodate */
> iter->dead_count = atomic_read(&root->dead_count)
> rcu_read_unlock()
> 
> iter->last_position is written, THEN iter->last_dead_count is written.
> 
> So, yes, you "need to read the dead count" first to be sure
> iter->last_position is uptodate.  But iter->last_dead_count, not
> root->dead_count.  I should have caught this in the final submission
> of your patch :(

OK, right you are. I managed to confuse myself by the three dependencies
here. dead_count -> last_visited -> last_dead_count. The first one is
invalid because last_visited doesn't care about dead_count and that
makes it much more clear now.

> Tejun's patch is not correct, either.  Something like this?

Yes this looks saner and correct. Care to send a full patch?

> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 010d6c1..92830fa 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1199,7 +1199,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
>  
>  			mz = mem_cgroup_zoneinfo(root, nid, zid);
>  			iter = &mz->reclaim_iter[reclaim->priority];
> -			last_visited = iter->last_visited;
>  			if (prev && reclaim->generation != iter->generation) {
>  				iter->last_visited = NULL;
>  				goto out_unlock;
> @@ -1217,14 +1216,20 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
>  			 * css_tryget() verifies the cgroup pointed to
>  			 * is alive.
>  			 */
> +			last_visited = NULL;
>  			dead_count = atomic_read(&root->dead_count);
> -			smp_rmb();
> -			last_visited = iter->last_visited;
> -			if (last_visited) {
> -				if ((dead_count != iter->last_dead_count) ||
> -					!css_tryget(&last_visited->css)) {
> +			if (dead_count == iter->last_dead_count) {
> +				/*
> +				 * The writer below sets the position
> +				 * pointer, then the dead count.
> +				 * Ensure we read the updated position
> +				 * when the dead count matches.
> +				 */
> +				smp_rmb();
> +				last_visited = iter->last_visited;
> +				if (last_visited &&
> +				    !css_tryget(&last_visited->css))
>  					last_visited = NULL;
> -				}
>  			}
>  		}
>  

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]