Re: [RFC] [PATCH 2/7 v2] memcg: add memory barrier for checking account move.

Michal Hocko <mhocko@xxxxxxx> · Tue, 24 Jan 2012 09:49:47 +0100

On Tue 24-01-12 12:21:20, KAMEZAWA Hiroyuki wrote:
> On Mon, 23 Jan 2012 10:04:36 +0100
> Michal Hocko <mhocko@xxxxxxx> wrote:
> 
> > On Fri 20-01-12 10:08:44, Ying Han wrote:
> > > On Wed, Jan 18, 2012 at 6:17 PM, KAMEZAWA Hiroyuki
> > > <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > > > On Wed, 18 Jan 2012 13:37:59 +0100
> > > > Michal Hocko <mhocko@xxxxxxx> wrote:
> > > >
> > > >> On Wed 18-01-12 09:06:56, KAMEZAWA Hiroyuki wrote:
> > > >> > On Tue, 17 Jan 2012 16:26:35 +0100
> > > >> > Michal Hocko <mhocko@xxxxxxx> wrote:
> > > >> >
> > > >> > > On Fri 13-01-12 17:33:47, KAMEZAWA Hiroyuki wrote:
> > > >> > > > I think this bugfix is needed before going ahead. thoughts?
> > > >> > > > ==
> > > >> > > > From 2cb491a41782b39aae9f6fe7255b9159ac6c1563 Mon Sep 17 00:00:00 2001
> > > >> > > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> > > >> > > > Date: Fri, 13 Jan 2012 14:27:20 +0900
> > > >> > > > Subject: [PATCH 2/7] memcg: add memory barrier for checking account move.
> > > >> > > >
> > > >> > > > At starting move_account(), source memcg's per-cpu variable
> > > >> > > > MEM_CGROUP_ON_MOVE is set. The page status update
> > > >> > > > routine check it under rcu_read_lock(). But there is no memory
> > > >> > > > barrier. This patch adds one.
> > > >> > >
> > > >> > > OK this would help to enforce that the CPU would see the current value
> > > >> > > but what prevents us from the race with the value update without the
> > > >> > > lock? This is as racy as it was before AFAICS.
> > > >> > >
> > > >> >
> > > >> > Hm, do I misunderstand ?
> > > >> > ==
> > > >> >    update                     reference
> > > >> >
> > > >> >    CPU A                        CPU B
> > > >> >   set value                rcu_read_lock()
> > > >> >   smp_wmb()                smp_rmb()
> > > >> >                            read_value
> > > >> >                            rcu_read_unlock()
> > > >> >   synchronize_rcu().
> > > >> > ==
> > > >> > I expect
> > > >> > If synchronize_rcu() is called before rcu_read_lock() => move_lock_xxx will be held.
> > > >> > If synchronize_rcu() is called after rcu_read_lock() => update will be delayed.
> > > >>
> > > >> Ahh, OK I can see it now. Readers are not that important because it is
> > > >> actually the updater who is delayed until all preexisting rcu read
> > > >> sections are finished.
> > > >>
> > > >> In that case. Why do we need both barriers? spin_unlock is a full
> > > >> barrier so maybe we just need smp_rmb before we read value to make sure
> > > >> that we do not get stalled value when we start rcu_read section after
> > > >> synchronize_rcu?
> > > >>
> > > >
> > > > I doubt .... If no barrier, this case happens
> > > >
> > > > ==
> > > >        update                  reference
> > > >        CPU A                   CPU B
> > > >        set value
> > > >        synchronize_rcu()       rcu_read_lock()
> > > >                                read_value <= find old value
> > > >                                rcu_read_unlock()
> > > >                                do no lock
> > > > ==
> > > 
> > > Hi Kame,
> > > 
> > > Can you help to clarify a bit more on the example above? Why
> > > read_value got the old value after synchronize_rcu().
> > 
> > AFAIU it is because rcu_read_unlock doesn't force any memory barrier
> > and we synchronize only the updater (with synchronize_rcu), so nothing
> > guarantees that the value set on CPUA is visible to CPUB.
> > 
> 
> Thank you. 
> 
> ...Finally, I'd like to make this check to atomic_t rather than complicated
> percpu counter. Hmm, do it now ?

I thought you wanted to prevent from atomics but you would need a read
barrier in the reader side because only atomics which change the state
imply a memory barrier IIRC. So it is a question why atomic is
simpler...

> 
> Thanks,
> -Kame
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html