On Sat, Jul 27, 2019 at 5:26 AM Sam Just <sjust@xxxxxxxxxx> wrote: > > > - to run bluestore in the same process of crimson-osd, but we will > > allocate some dedicated threads (and CPU cores) to it. we could use > > ceph::thread::ThreadPool for this purpose. for instance, we will have > > 3 ConfigProxy backends. > > 1. the classic ConfigProxy used by classic OSD and other daemons and > > command line utilities. the ConfigProxy normally resides in a global > > CephContext. > > 2. the ceph::common::ConfigProxy solely used by crimson OSD. it is > > rewritten using seastar. it's a sharded service. normally we just > > access the config proxy directly in crimson, like > > 'local_conf().get_val< uint64_t>("name")' instead of using something > > like 'cct->_conf.get_val<uint64_t>("name")' > > 3. the ConfigProxy used by bluestore living in the alien world. its > > interface will be exactly the same as the classic one, but it will > > call into its crimson counterpart using the `seastar::alien::submit()` > > call. > > I'm not sure this is quite right. I think that the seastar config > would have a reference over to the alien config machinery in order to > inject config changes and do the initial setup, but the alien side > needn't have a reference to the crimson one. i was thinking about the implementation of ConfigProxy::get_val<>(). but yeah, if we 1) have a separated copy of ConfigValue on the alien side, 2) let the alien side work in the passive mode, and 3) use the ThreadPool::submit() to inject config changes into alien's ConfigProxy, what'd be a lot easier. > > > in addition to WITH_SEASTAR macro, we can introduce yet another > > macro allowing us to call into the facilities offered by > > crimson-common. and we can use inline namespace to differentiate the > > 2nd from 3rd implementations. as they will need to be co-located in > > the same process. and without using different names, we'd violate ODR. > > - to hide bluestore in a library which links against ceph-common > > library. but the libblustore won't expose any ceph-common symbols to > > crimson-osd. but we need to figure out how to maintain the internal > > status of ceph-common. as it not quite self-contained in the sense > > that it need to access the logging, config and other facilities > > offered by crimson-osd. > > The library option seems promising to me if we go this direction. It > can even export an interface which is entirely agnostic of the config > machinery (maybe take a serialized representation of the config > values?) and write to a different log file at first. yeah, probably we just need a "keyhole" for updating the alien side's config settings. this option actually is a variant of the previous one. the only difference is that, we need to use different namespaces to differentiate the symbols in bluestore from those in crimson-common. > > > - to port rocksdb to seastar: to be specific, this approach will use > > seastar's green thread to implement the Mutex, CondVar and Thread in > > rocksdb, and implement all blocking calls using seastar's > > counterparts. if this approach is proved to be workable. the next > > problem would be to upstream this change. and in a long run, the > > rocksdb backed bluestore will be replaced by seastore if seastore is > > capable of supporting relatively slow devices as well. > > I've started to look at your rocksdb port. It does look like the > parts we'd need to adapt are appropriately factored out in rocksdb, > and I bet we'd get interest from upstream. We might want to take > their temperature sooner rather than later? We'd also have to perform good idea! will do so early tomorrow! > essentially the same refactor in Bluestore in order to break the > bluestore logic apart from the IO/blocking/locking portions. I guess > this exists in some form with the BlockDevice interface, but we'll > also have to introduce something like rocksdb's lock replacement. > This path would get us a much more cooperative (probably more > performant as well, particularly in high density hosts) bluestore in > the long run, so it might be worth the work. thanks. your insights are inspiring! > > > - seastore: a completely rewritten object store backend targeting fast > > NVMe devices. but it will take longer to get there. > > I think we're going to do this no matter what. I think > alien/bluestore choice is about how we want to test crimson prior to > developing seastore and possibly for handling devices inappropriate > for seastore? that's also my impression. the way how i see it is just because we haven't started scoping it or had a low level design. > -Sam -- Regards Kefu Chai _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx