Re: [crimson] bluestore in an alien world

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 27, 2019 at 5:26 AM Sam Just <sjust@xxxxxxxxxx> wrote:
>
> > - to run bluestore in the same process of crimson-osd, but we will
> > allocate some dedicated threads (and CPU cores) to it. we could use
> > ceph::thread::ThreadPool for this purpose. for instance, we will have
> > 3 ConfigProxy backends.
> >   1. the classic ConfigProxy used by classic OSD and other daemons and
> > command line utilities. the ConfigProxy normally resides in a global
> > CephContext.
> >   2. the ceph::common::ConfigProxy solely used by crimson OSD. it is
> > rewritten using seastar. it's a sharded service. normally we just
> > access the config proxy directly in crimson, like
> > 'local_conf().get_val< uint64_t>("name")' instead of using something
> > like 'cct->_conf.get_val<uint64_t>("name")'
> >   3. the ConfigProxy used by bluestore living in the alien world. its
> > interface will be exactly the same as the classic one, but it will
> > call into its crimson counterpart using the `seastar::alien::submit()`
> > call.
>
> I'm not sure this is quite right.  I think that the seastar config
> would have a reference over to the alien config machinery in order to
> inject config changes and do the initial setup, but the alien side
> needn't have a reference to the crimson one.

i was thinking about the implementation of ConfigProxy::get_val<>().
but yeah, if we 1) have a separated copy of ConfigValue on the alien
side, 2) let the alien side work in the passive mode, and 3) use the
ThreadPool::submit() to inject config changes into alien's
ConfigProxy, what'd be a lot easier.

>
> >   in addition to WITH_SEASTAR macro, we can introduce yet another
> > macro allowing us to call into the facilities offered by
> > crimson-common. and we can use inline namespace to differentiate the
> > 2nd from 3rd implementations. as they will need to be co-located in
> > the same process. and without using different names, we'd violate ODR.
> > - to hide bluestore in a library which links against ceph-common
> > library. but the libblustore won't expose any ceph-common symbols to
> > crimson-osd. but we need to figure out how to maintain the internal
> > status of ceph-common. as it not quite self-contained in the sense
> > that it need to access the logging, config and other facilities
> > offered by crimson-osd.
>
> The library option seems promising to me if we go this direction.  It
> can even export an interface which is entirely agnostic of the config
> machinery (maybe take a serialized representation of the config
> values?) and write to a different log file at first.

yeah, probably we just need a "keyhole" for updating the alien side's
config settings. this option actually is a variant of the previous
one. the only difference is that, we need to use different namespaces
to differentiate the symbols in bluestore from those in
crimson-common.

>
> > - to port rocksdb to seastar: to be specific, this approach will use
> > seastar's green thread to implement the Mutex, CondVar and Thread in
> > rocksdb, and implement all blocking calls using seastar's
> > counterparts. if this approach is proved to be workable. the next
> > problem would be to upstream this change. and in a long run, the
> > rocksdb backed bluestore will be replaced by seastore if seastore is
> > capable of supporting relatively slow devices as well.
>
> I've started to look at your rocksdb port.  It does look like the
> parts we'd need to adapt are appropriately factored out in rocksdb,
> and I bet we'd get interest from upstream.  We might want to take
> their temperature sooner rather than later?  We'd also have to perform

good idea! will do so early tomorrow!

> essentially the same refactor in Bluestore in order to break the
> bluestore logic apart from the IO/blocking/locking portions.  I guess
> this exists in some form with the BlockDevice interface, but we'll
> also have to introduce something like rocksdb's lock replacement.
> This path would get us a much more cooperative (probably more
> performant as well, particularly in high density hosts) bluestore in
> the long run, so it might be worth the work.

thanks. your insights are inspiring!

>
> > - seastore: a completely rewritten object store backend targeting fast
> > NVMe devices. but it will take longer to get there.
>
> I think we're going to do this no matter what.  I think
> alien/bluestore choice is about how we want to test crimson prior to
> developing seastore and possibly for handling devices inappropriate
> for seastore?

that's also my impression. the way how i see it is just because we
haven't started scoping it or had a low level design.

> -Sam



-- 
Regards
Kefu Chai
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux