Re: using RCU to replace Locker in config for seastar version

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



i looked into the seastar::shared_ptr<>, seastar::lw_shared_ptr<> and
std::weak_ptr<> in libstdc++. it's non-trivial to add the weak_ptr
semantic support to seastar::shared_ptr<> to get a C++11 standard
compliant implementation without re-implement it by adding a separated
weak_count co-located with the "shared" count. and what we need is
actually the std::shared_ptr<> with _Lock_policy=_S_single. but by
default, libstdc++ uses atomic operations. we *can* optimize it by
using the plain machine word. as CAS and lock operations are much more
expensive than a plain memory access. as mentioned earlier, they need
to follow the cache coherent protocol to address the racing. but we
don't need to bother with this in seastar.

as implementing shared_ptr/weak_ptri is a non-trivial (but very
doable!) task, we can live with this the atomic shared_ptr<> and
weak_ptr<> at this moment.

On Fri, Jun 22, 2018 at 12:41 PM Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote:
>
> Thanks Kefu,
>
>     Will study seastar ptrs first.
>
>    Still has a question, how to handle some data structure such as op seq, each consumer need read it and increase 1, seems the message-passing we discussed can't fit for this situation.  Any idea on it?

could you be more specific ? what is exactly op seq?

>
> Thanks!
> -Chunmei
>
>
> > -----Original Message-----
> > From: kefu chai [mailto:tchaikov@xxxxxxxxx]
> > Sent: Wednesday, June 20, 2018 10:37 PM
> > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx>
> > Cc: Gregory Farnum <gfarnum@xxxxxxxxxx>; Sage Weil
> > <sage@xxxxxxxxxxxx>; Matt Benjamin <mbenjami@xxxxxxxxxx>; Kefu Chai
> > <kchai@xxxxxxxxxx>; The Esoteric Order of the Squid Cybernetic <ceph-
> > devel@xxxxxxxxxxxxxxx>
> > Subject: Re: using RCU to replace Locker in config for seastar version
> >
> > On Wed, Jun 13, 2018 at 3:39 AM Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote:
> > >
> > > Hi Greg,
> > >
> > >    I still has some questions, please see below.
> > >
> > > -----Original Message-----
> > > From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx]
> > > Sent: Sunday, June 10, 2018 10:58 AM
> > > To: Sage Weil <sage@xxxxxxxxxxxx>
> > > Cc: kefu chai <tchaikov@xxxxxxxxx>; Liu, Chunmei
> > > <chunmei.liu@xxxxxxxxx>; Matt Benjamin <mbenjami@xxxxxxxxxx>; Kefu
> > > Chai <kchai@xxxxxxxxxx>; The Esoteric Order of the Squid Cybernetic
> > > <ceph-devel@xxxxxxxxxxxxxxx>
> > > Subject: Re: using RCU to replace Locker in config for seastar version
> > >
> > > On Fri, Jun 8, 2018 at 5:29 PM, Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote:
> > > > Hi Greg,
> > > >
> > > >    How to use message-passing? each core maintain a local replication copy
> > of data structure and use message-passing to inform other cores update its own
> > local copy.   Or only one core can access data structure, the other cores should
> > get shared data structure through this core?
> > >
> > > Just as a first pass, in the case of the config structure it might be something
> > like:
> > > 1) Create new config struct in memory on "server" core
> > > 2) Use the "sharded_shared_ptr" I'll discuss below to give each core a
> > > reference to it
> > > 3) Send a message to the cores telling them this has happened
> > > 4) At a later time, clean up the previous config structure when all cores drop
> > their refcounts to zero.
> > >
> > > [liucm] you said clean up the previous config structure, does it mean when
> > modification happen, we need copy the data structure then update it?
> > > [liucm] local refcount means this core has users access the data structure,
> > global atomic refcount means there are cores access the data structure, right?
> > > [liucm] you said all cores drop their refcounts to zero, so it is local refcount,
> > how does server cores know it? Local core send message to server or local core
> > itself know it is enough?
> >
> > local core will send a message to the owner core.
> >
> > >  [liucm] if server core (or local core ?) check a core local refcount decrease to
> > zero, server core (or local core ?) decrease atomic global refcount?  Which core
> > do this work?
> >
> > i think i've explained this in the previous reply. it'd be the server
> > (owner) core who checks the local refcount.
> >
> > > [liucm] server core will check until global refcount to be zero then update the
> > data structure pointer to the new copy?  How to monitor the global refcount to
> > decrease to zero?
> >
> > i encourage you to refer to foreign_ptr<> in seastar.
> >
> > >
> > >
> > > Now, that looks an awful lot like RCU, which makes sense since it's a useful
> > basic algorithm. But we're avoiding trying to properly track accesses via a library
> > like liburcu that's been referenced. I like that both because it limits the number
> > paradigms a Ceph developer needs to be able to work with, and also because
> > we've prototyped using liburcu before and found it made things *slower*.
> > > We can do something similar for the osd_map_cache, where local threads
> > keep their own map of epochs to pointers, with local integer ref counts, and
> > drop the global atomic count when the thread drops all users.
> > >
> > >
> > > On Sat, Jun 9, 2018 at 12:16 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > > >> > > When I went through the data structures, it generally seemed
> > > >> > > like message-passing about data structure changes would be a
> > > >> > > better way to go than trying to employ any kind of real RCU
> > > >> > > library (or even the exact abstractions). We might maintain
> > > >> > > local pointers to constant structures with a per-core ref count
> > > >> > > to protect deletion, but proper
> > > >
> > > > Is there already a per-core ref-counting foo_ptr<> that does this?
> > > > (This being a core/thread-local refcount, and a global atomic
> > > > refcount?)  This seems useful in lots of places (probably most
> > > > places we use RefCountedObject now... things like OSDSession).
> > >
> > >
> > > Yeah, I don't think anything like this exists. But it'll be a useful tool,
> > *especially* when we start mixing in posix threads.
> > >
> > > Just to be clear, I'm thinking something like:
> > >
> > > class sharded_shared_pointer_owner<T> {
> > >   int local_ref_count;
> > >   root_pointer<T> {
> > >     atomic_t ref_count;
> > >     T *object;
> > >   }
> > >   root_pointer<T> *parent;
> > > }
> > >
> > > class sharded_shared_pointer<T> {
> > >   sharded_shared_pointer_owner *parent; }
> > >
> > > Where copying the sharded_shared_pointer increments the local_ref_count,
> > and the sharded_shared_pointer_owner is used on copying between threads and
> > increments the root_pointer::ref_count.
> > >
> > > [liucm] I don't understand the above sentence, what you mean copying the
> > pointer here? Can you give a detail example?
> >
> > it's all about the semantics and the implementation of a typical smart_ptr<>. for
> > instance, the copy constructor should increment the refcount of the
> > local_ref_count.
> >
> > > [liucm] In above data structure, which one or part is used by server core?
> > Which one or part is used by other cores?  I guess root_pointer point to the
> > shared data structure which is only one copy in server core, and local ref_count
> > is each core's local variable, right?
> >
> > i think you can refer to the implementation of shared_ptr<> and lw_shared_ptr<>
> > in seastar/core/shared_ptr.hh, and foreign_ptr<> in seastar/core/sharded.hh.
> > actually, lw_shared_ptr<> is basically what we need when implementing
> > SharedLRU<> in Ceph. what is missing in seastar's weak_ptr<> and
> > lw_shared_ptr<>/shared_ptr<>, is the ability to construct a weak_ptr<> from a
> > shared_ptr<>, and to promote a weak_ptr<> to shared_ptr<>. and foreign_ptr<>
> > is what we need to share a given osdmap from its owner core to non-owner
> > cores. my plan is to re-implement a seastar variant of std::shared_ptr<> and
> > std::week_ptr<>.  so they are more light-weighted than their standard library
> > counterparts in that they will use plain machine word integer for refcounting
> > instead of using the atomic types.
> >
> > if RCU is not as performant as we expect, we can also apply the foreign_ptr<>
> > machinery to config, if we want to keep a single copy of config in OSD. to be
> > specific,
> >
> > 0. the owner core caches a map of settings: Config, it returns a
> > shared_ptr<ConfigProxy> upon request for config from any of the fibers. and
> > keep track of this shared_ptr<ConfigProxy> using a week_ptr. if the the
> > ConfigProxy is destroyed, we should create a new instance of it upon request.
> > please note, we assume that a shared_ptr<> can be constructed using
> > weak_ptr<>. this ability is not offered by seastar's shared_ptr() at this moment.
> > 1. all non-owner fibers can only update settings using submit_to() call to the
> > owner core 2. all fibers on the owner core trying to update settings should wait
> > on a seastar condition_variable if the weak_ptr<> is tracking some ConfigProxy,
> > which will be signaled when the ConfigProxy is destroyed.
> > 3. local consumers of Config should access it via shared_ptr<ConfigProxy> 4.
> > foreign consumers of Config should access it via
> > foreign_ptr<shared_ptr<ConfigProxy>>
> >
> > > -thanks!
> > >
> > > All names subject to change for better ones, of course.
> > > Another thought (I really don't know how these costs work out) is that
> > > when we drop the sharded_shared_pointer_owner local_ref_count to zero,
> > > is that we pass a message to the owner thread instead of directly
> > > manipulating the parent->ref_count atomic. It's hard to have a good
> > > intuition for those costs, and I know I don't! (The nice part about
> > > using pointer structures instead of direct access throughout the code
> > > is that it's of course easy to change the cross-core implementation as
> > > we experiment and get new data.) -Greg
> >
> >
> >
> > --
> > Regards
> > Kefu Chai



-- 
Regards
Kefu Chai
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux