Re: [crimson] bluestore in an alien world

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 24, 2019 at 6:21 AM Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote:
>
> Hi kefu,
>
>    Even we use seastar::thread, I think we still need #ifdef macro to build these code together, am I right? The same binary still contain both blocking code and unblocking code.
>
>    Let's discuss it in the meeting.
>
> -Chunmei
>
>
> > -----Original Message-----
> > From: kefu chai [mailto:tchaikov@xxxxxxxxx]
> > Sent: Monday, July 22, 2019 7:37 AM
> > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx>; dev@xxxxxxx
> > Subject: [crimson] bluestore in an alien world
> >
> > hi Chunmei,
> >
> > i am reviewing your change of
> > https://github.com/ceph/ceph/compare/master...liu-
> > chunmei:ceph_seastar_alien_store.
> > it looks good in general. i think the simplest way to co-locate different versions
> > of alien-common, ceph-common and crimson-common is to introduce different
> > namespaces. because we need to have alien-common and crimson-common in
> > the same binary, and to have all of these three versions in the same repository.
> >
> > but this divergence concerns me, as it introduces yet another condition in the
> > shared infrastructure in our code base. and in the long run, this #ifdef won't go
> > away if we want to go this way, so i need to at least give it a try. what is "it"? to
> > port rocksdb to seastar. as seastar offers "seastar::thread" which makes it
> > relative simpler to wrap the blocking calls with ucontext. and rocksdb offers a
> > abstraction machinery allowing one to port it to a new platform. and seastar is a
> > "platform" to some degree, i'd say.
> >
> > will update you guys with my progress and findings.

i am noting down the takeaways in last crimson standup:

the problem we want to resolve:

- we already have two "common" libraries compiled from the same source
tree: ceph-common and crimson-common. because crimson is pretty much
single-threaded, all mutex, atomic and other synchronizing primitives
are defined as no-op if WITH_SEASTAR is defined. and these two
libraries share the same set of symbol names.
- because rocksdb is desgined around the semantic of blocking calls.
and it does not support seastar at this moment. if we want to continue
using bluestore in crimson-osd, we will have to put bluestore in a
world where the blocked calls are allowed. this world or environment
is dubbed the alien world from seastar's point of view.
- and crimson is targeting fast storage devices. the assumptions made
by rocksdb do not hold anymore there.

some possible solutions

- to run bluestore in a separated process and use mmap to do the IPC
between crimson-osd and bluestore. we can use a ringbuffer to manage
the shared memory, the way how crimson-osd talks to bluestore will be
quite like io-uring. if we can have a customized allocator to so we
can point the msgr to the allocator to avoid memcpy. if we cannot
think of anything that could hurt the performance significantly, this
approach would be
- to run bluestore in the same process of crimson-osd, but we will
allocate some dedicated threads (and CPU cores) to it. we could use
ceph::thread::ThreadPool for this purpose. for instance, we will have
3 ConfigProxy backends.
  1. the classic ConfigProxy used by classic OSD and other daemons and
command line utilities. the ConfigProxy normally resides in a global
CephContext.
  2. the ceph::common::ConfigProxy solely used by crimson OSD. it is
rewritten using seastar. it's a sharded service. normally we just
access the config proxy directly in crimson, like
'local_conf().get_val< uint64_t>("name")' instead of using something
like 'cct->_conf.get_val<uint64_t>("name")'
  3. the ConfigProxy used by bluestore living in the alien world. its
interface will be exactly the same as the classic one, but it will
call into its crimson counterpart using the `seastar::alien::submit()`
call.
  in addition to WITH_SEASTAR macro, we can introduce yet another
macro allowing us to call into the facilities offered by
crimson-common. and we can use inline namespace to differentiate the
2nd from 3rd implementations. as they will need to be co-located in
the same process. and without using different names, we'd violate ODR.
- to hide bluestore in a library which links against ceph-common
library. but the libblustore won't expose any ceph-common symbols to
crimson-osd. but we need to figure out how to maintain the internal
status of ceph-common. as it not quite self-contained in the sense
that it need to access the logging, config and other facilities
offered by crimson-osd.
- to port rocksdb to seastar: to be specific, this approach will use
seastar's green thread to implement the Mutex, CondVar and Thread in
rocksdb, and implement all blocking calls using seastar's
counterparts. if this approach is proved to be workable. the next
problem would be to upstream this change. and in a long run, the
rocksdb backed bluestore will be replaced by seastore if seastore is
capable of supporting relatively slow devices as well.
- seastore: a completely rewritten object store backend targeting fast
NVMe devices. but it will take longer to get there.


> >
> >
> > --
> > Regards
> > Kefu Chai




--
Regards
Kefu Chai
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux