Re: [crimson] bluestore in an alien world

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 29, 2019 at 9:19 PM kefu chai <tchaikov@xxxxxxxxx> wrote:
>
> On Sat, Jul 27, 2019 at 5:26 AM Sam Just <sjust@xxxxxxxxxx> wrote:
> >
> > > - to run bluestore in the same process of crimson-osd, but we will
> > > allocate some dedicated threads (and CPU cores) to it. we could use
> > > ceph::thread::ThreadPool for this purpose. for instance, we will have
> > > 3 ConfigProxy backends.
> > >   1. the classic ConfigProxy used by classic OSD and other daemons and
> > > command line utilities. the ConfigProxy normally resides in a global
> > > CephContext.
> > >   2. the ceph::common::ConfigProxy solely used by crimson OSD. it is
> > > rewritten using seastar. it's a sharded service. normally we just
> > > access the config proxy directly in crimson, like
> > > 'local_conf().get_val< uint64_t>("name")' instead of using something
> > > like 'cct->_conf.get_val<uint64_t>("name")'
> > >   3. the ConfigProxy used by bluestore living in the alien world. its
> > > interface will be exactly the same as the classic one, but it will
> > > call into its crimson counterpart using the `seastar::alien::submit()`
> > > call.
> >
> > I'm not sure this is quite right.  I think that the seastar config
> > would have a reference over to the alien config machinery in order to
> > inject config changes and do the initial setup, but the alien side
> > needn't have a reference to the crimson one.
>
> i was thinking about the implementation of ConfigProxy::get_val<>().
> but yeah, if we 1) have a separated copy of ConfigValue on the alien
> side, 2) let the alien side work in the passive mode, and 3) use the
> ThreadPool::submit() to inject config changes into alien's
> ConfigProxy, what'd be a lot easier.
>
> >
> > >   in addition to WITH_SEASTAR macro, we can introduce yet another
> > > macro allowing us to call into the facilities offered by
> > > crimson-common. and we can use inline namespace to differentiate the
> > > 2nd from 3rd implementations. as they will need to be co-located in
> > > the same process. and without using different names, we'd violate ODR.
> > > - to hide bluestore in a library which links against ceph-common
> > > library. but the libblustore won't expose any ceph-common symbols to
> > > crimson-osd. but we need to figure out how to maintain the internal
> > > status of ceph-common. as it not quite self-contained in the sense
> > > that it need to access the logging, config and other facilities
> > > offered by crimson-osd.
> >
> > The library option seems promising to me if we go this direction.  It
> > can even export an interface which is entirely agnostic of the config
> > machinery (maybe take a serialized representation of the config
> > values?) and write to a different log file at first.
>
> yeah, probably we just need a "keyhole" for updating the alien side's
> config settings. this option actually is a variant of the previous
> one. the only difference is that, we need to use different namespaces
> to differentiate the symbols in bluestore from those in
> crimson-common.
>
> >
> > > - to port rocksdb to seastar: to be specific, this approach will use
> > > seastar's green thread to implement the Mutex, CondVar and Thread in
> > > rocksdb, and implement all blocking calls using seastar's
> > > counterparts. if this approach is proved to be workable. the next
> > > problem would be to upstream this change. and in a long run, the
> > > rocksdb backed bluestore will be replaced by seastore if seastore is
> > > capable of supporting relatively slow devices as well.
> >
> > I've started to look at your rocksdb port.  It does look like the
> > parts we'd need to adapt are appropriately factored out in rocksdb,
> > and I bet we'd get interest from upstream.  We might want to take
> > their temperature sooner rather than later?  We'd also have to perform
>
> good idea! will do so early tomorrow!
>
> > essentially the same refactor in Bluestore in order to break the
> > bluestore logic apart from the IO/blocking/locking portions.  I guess
> > this exists in some form with the BlockDevice interface, but we'll
> > also have to introduce something like rocksdb's lock replacement.
> > This path would get us a much more cooperative (probably more
> > performant as well, particularly in high density hosts) bluestore in
> > the long run, so it might be worth the work.
>
> thanks. your insights are inspiring!

the test of `env_seastar_test` passed, so it kinda works. and i also
wrote a post in https://www.facebook.com/groups/rocksdb.dev/ to get
the opinions from the upstream community.

>
> >
> > > - seastore: a completely rewritten object store backend targeting fast
> > > NVMe devices. but it will take longer to get there.
> >
> > I think we're going to do this no matter what.  I think
> > alien/bluestore choice is about how we want to test crimson prior to
> > developing seastore and possibly for handling devices inappropriate
> > for seastore?
>
> that's also my impression. the way how i see it is just because we
> haven't started scoping it or had a low level design.
>
> > -Sam
>
>
>
> --
> Regards
> Kefu Chai



-- 
Regards
Kefu Chai
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux