On Wed, Jul 24, 2019 at 6:21 AM Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote: > > Hi kefu, > > Even we use seastar::thread, I think we still need #ifdef macro to build these code together, am I right? The same binary still contain both blocking code and unblocking code. > > Let's discuss it in the meeting. > > -Chunmei > > > > -----Original Message----- > > From: kefu chai [mailto:tchaikov@xxxxxxxxx] > > Sent: Monday, July 22, 2019 7:37 AM > > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx>; dev@xxxxxxx > > Subject: [crimson] bluestore in an alien world > > > > hi Chunmei, > > > > i am reviewing your change of > > https://github.com/ceph/ceph/compare/master...liu- > > chunmei:ceph_seastar_alien_store. > > it looks good in general. i think the simplest way to co-locate different versions > > of alien-common, ceph-common and crimson-common is to introduce different > > namespaces. because we need to have alien-common and crimson-common in > > the same binary, and to have all of these three versions in the same repository. > > > > but this divergence concerns me, as it introduces yet another condition in the > > shared infrastructure in our code base. and in the long run, this #ifdef won't go > > away if we want to go this way, so i need to at least give it a try. what is "it"? to > > port rocksdb to seastar. as seastar offers "seastar::thread" which makes it > > relative simpler to wrap the blocking calls with ucontext. and rocksdb offers a > > abstraction machinery allowing one to port it to a new platform. and seastar is a > > "platform" to some degree, i'd say. > > > > will update you guys with my progress and findings. i am noting down the takeaways in last crimson standup: the problem we want to resolve: - we already have two "common" libraries compiled from the same source tree: ceph-common and crimson-common. because crimson is pretty much single-threaded, all mutex, atomic and other synchronizing primitives are defined as no-op if WITH_SEASTAR is defined. and these two libraries share the same set of symbol names. - because rocksdb is desgined around the semantic of blocking calls. and it does not support seastar at this moment. if we want to continue using bluestore in crimson-osd, we will have to put bluestore in a world where the blocked calls are allowed. this world or environment is dubbed the alien world from seastar's point of view. - and crimson is targeting fast storage devices. the assumptions made by rocksdb do not hold anymore there. some possible solutions - to run bluestore in a separated process and use mmap to do the IPC between crimson-osd and bluestore. we can use a ringbuffer to manage the shared memory, the way how crimson-osd talks to bluestore will be quite like io-uring. if we can have a customized allocator to so we can point the msgr to the allocator to avoid memcpy. if we cannot think of anything that could hurt the performance significantly, this approach would be - to run bluestore in the same process of crimson-osd, but we will allocate some dedicated threads (and CPU cores) to it. we could use ceph::thread::ThreadPool for this purpose. for instance, we will have 3 ConfigProxy backends. 1. the classic ConfigProxy used by classic OSD and other daemons and command line utilities. the ConfigProxy normally resides in a global CephContext. 2. the ceph::common::ConfigProxy solely used by crimson OSD. it is rewritten using seastar. it's a sharded service. normally we just access the config proxy directly in crimson, like 'local_conf().get_val< uint64_t>("name")' instead of using something like 'cct->_conf.get_val<uint64_t>("name")' 3. the ConfigProxy used by bluestore living in the alien world. its interface will be exactly the same as the classic one, but it will call into its crimson counterpart using the `seastar::alien::submit()` call. in addition to WITH_SEASTAR macro, we can introduce yet another macro allowing us to call into the facilities offered by crimson-common. and we can use inline namespace to differentiate the 2nd from 3rd implementations. as they will need to be co-located in the same process. and without using different names, we'd violate ODR. - to hide bluestore in a library which links against ceph-common library. but the libblustore won't expose any ceph-common symbols to crimson-osd. but we need to figure out how to maintain the internal status of ceph-common. as it not quite self-contained in the sense that it need to access the logging, config and other facilities offered by crimson-osd. - to port rocksdb to seastar: to be specific, this approach will use seastar's green thread to implement the Mutex, CondVar and Thread in rocksdb, and implement all blocking calls using seastar's counterparts. if this approach is proved to be workable. the next problem would be to upstream this change. and in a long run, the rocksdb backed bluestore will be replaced by seastore if seastore is capable of supporting relatively slow devices as well. - seastore: a completely rewritten object store backend targeting fast NVMe devices. but it will take longer to get there. > > > > > > -- > > Regards > > Kefu Chai -- Regards Kefu Chai _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx