Re: using RCU to replace Locker in config for seastar version

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 9, 2018 at 8:30 AM Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote:
>
> Hi Greg,
>
>    How to use message-passing? each core maintain a local replication copy of data structure and use message-passing to inform other cores update its own local copy.   Or only one core can access data structure, the other cores should get shared data structure through this core?
>
> Thanks!
> -Chunmei
> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Matt Benjamin
> Sent: Friday, June 08, 2018 11:40 AM
> To: Gregory Farnum <gfarnum@xxxxxxxxxx>
> Cc: Kefu Chai <kchai@xxxxxxxxxx>; Sage Weil <sweil@xxxxxxxxxx>; Liu, Chunmei <chunmei.liu@xxxxxxxxx>; The Esoteric Order of the Squid Cybernetic <ceph-devel@xxxxxxxxxxxxxxx>
> Subject: Re: using RCU to replace Locker in config for seastar version
>
> That's what I would have thought, yes.  I thought the discussion was about RCU in the pthreaded codebase.  Dan Lambright prototyped that for one of the maps with liburcu, a good while ago.
>
> Matt
>
> On Fri, Jun 8, 2018 at 2:35 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > Can anybody lay out a concrete use case for us employing real RCU in a
> > Seastar OSD?

i think we can use RCU to implement the concurrency of reading and
updating config settings, as the config settings can be versioned,
when a reader reads the setting it holds a version of the settings,
when a writer is about to update a setting, it wait until all readers
relinquish the old version of settings.

if we go with the solution proposed by me in last CDM, we need to keep
a full copy of all OSD related settings on each core. when the writer
changes a setting, the core serving the MCommand will send messages to
all cores to get their copies of settings updated. this model is
simpler. but it is not spacial efficient.

> > When I went through the data structures, it generally seemed like
> > message-passing about data structure changes would be a better way to
> > go than trying to employ any kind of real RCU library (or even the
> > exact abstractions). We might maintain local pointers to constant
> > structures with a per-core ref count to protect deletion, but proper

i think this model works better with osdmap caching, as i think osdmap
is not updated very frequently in a healthy cluster. so we can
update/retrive the map using the message-passing machinery. and have a
single copy maintained by a configured core. but settings are
constantly read, the reference count of it on a certain core could be
flipping between 1 and 0 all the time (sometime it could be greater
than 1), so i don't think it's efficient to use the message-passing to
maintain the config settings.

> > RCU would involve unpredictable access to out-of-core memory locations
> > (especially if we have multiple writers!), whereas if we stick with
> > the message-passing that Seastar expects, we get all the optimizations
> > that come from those very careful data structures.
> > -Greg
> >
> > On Fri, Jun 8, 2018 at 4:41 AM, kefu chai <tchaikov@xxxxxxxxx> wrote:
> >> On Fri, Jun 8, 2018 at 11:08 AM Sage Weil <sweil@xxxxxxxxxx> wrote:
> >>>
> >>> On Fri, 8 Jun 2018, Liu, Chunmei wrote:
> >>> > Hi Kefu,
> >>> >
> >>> >   For RCU, I get the following facts, 1. For readers, no block,
> >>> > and support multiple readers concurrently.
> >>> > 2. For writers, it is blocked until all readers go out of reader-side critical session. If there are multiple concurrent writers need spin lock to synchronize all the writers.
> >>> >
> >>> > Are there multi concurrent writers for config or OSDMap?
> >>>
> >>> Not really... and for config writes are rare.  I suspect we could
> >>> even get away with something that just allocates each new value on
> >>> the heap and updates a pointer, and then never bothers to free the
> >>> old values.  (Or maybe frees them after a ridiculous amount of time
> >>> has passed.)
> >>
> >> yeah, RCU supports the concurrency between a single writer and
> >> multiple readers. in the case of config options, i think the
> >> writer(s) would be the threads/reactors serving the MCommand sent
> >> from osd clients. so in theory, there could be multiple writers. but
> >> as Sage pointed out, writes are rare. i think we could use spin lock
> >> to implement the exclusive lock.
> >>
> >>>
> >>> For the OSDMap cache, we have a constant stream of new maps coming
> >>> in and old maps getting pruned, so it's a bit trickier.
> >>
> >> yeah, both MonClient and peer OSDs update the OSD with new maps, and
> >> OSD keep trimming unused maps from the cache.
> >>
> >>>
> >>> > Do you think it is acceptable in seastar version since need spin
> >>> > lock for multiple concurrent writers?
> >>
> >> i think it's fine if the write is relatively rare and what the spin
> >> lock protects are very fast operations, like flipping a flag or
> >> setting a pointer, etc.
> >>
> >>> >
> >>> > Thanks!
> >>> > -Chunmei
> >>> > --
> >>> > To unsubscribe from this list: send the line "unsubscribe
> >>> > ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx
> >>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> >
> >>> >
> >>
> >>
> >>
> >> --
> >> Regards
> >> Kefu Chai
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> >> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> >> info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards
Kefu Chai
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux