On Mon, Jul 29, 2019 at 9:19 PM kefu chai <tchaikov@xxxxxxxxx> wrote: > > On Sat, Jul 27, 2019 at 5:26 AM Sam Just <sjust@xxxxxxxxxx> wrote: > > > > > - to run bluestore in the same process of crimson-osd, but we will > > > allocate some dedicated threads (and CPU cores) to it. we could use > > > ceph::thread::ThreadPool for this purpose. for instance, we will have > > > 3 ConfigProxy backends. > > > 1. the classic ConfigProxy used by classic OSD and other daemons and > > > command line utilities. the ConfigProxy normally resides in a global > > > CephContext. > > > 2. the ceph::common::ConfigProxy solely used by crimson OSD. it is > > > rewritten using seastar. it's a sharded service. normally we just > > > access the config proxy directly in crimson, like > > > 'local_conf().get_val< uint64_t>("name")' instead of using something > > > like 'cct->_conf.get_val<uint64_t>("name")' > > > 3. the ConfigProxy used by bluestore living in the alien world. its > > > interface will be exactly the same as the classic one, but it will > > > call into its crimson counterpart using the `seastar::alien::submit()` > > > call. > > > > I'm not sure this is quite right. I think that the seastar config > > would have a reference over to the alien config machinery in order to > > inject config changes and do the initial setup, but the alien side > > needn't have a reference to the crimson one. > > i was thinking about the implementation of ConfigProxy::get_val<>(). > but yeah, if we 1) have a separated copy of ConfigValue on the alien > side, 2) let the alien side work in the passive mode, and 3) use the > ThreadPool::submit() to inject config changes into alien's > ConfigProxy, what'd be a lot easier. > > > > > > in addition to WITH_SEASTAR macro, we can introduce yet another > > > macro allowing us to call into the facilities offered by > > > crimson-common. and we can use inline namespace to differentiate the > > > 2nd from 3rd implementations. as they will need to be co-located in > > > the same process. and without using different names, we'd violate ODR. > > > - to hide bluestore in a library which links against ceph-common > > > library. but the libblustore won't expose any ceph-common symbols to > > > crimson-osd. but we need to figure out how to maintain the internal > > > status of ceph-common. as it not quite self-contained in the sense > > > that it need to access the logging, config and other facilities > > > offered by crimson-osd. > > > > The library option seems promising to me if we go this direction. It > > can even export an interface which is entirely agnostic of the config > > machinery (maybe take a serialized representation of the config > > values?) and write to a different log file at first. > > yeah, probably we just need a "keyhole" for updating the alien side's > config settings. this option actually is a variant of the previous > one. the only difference is that, we need to use different namespaces > to differentiate the symbols in bluestore from those in > crimson-common. > > > > > > - to port rocksdb to seastar: to be specific, this approach will use > > > seastar's green thread to implement the Mutex, CondVar and Thread in > > > rocksdb, and implement all blocking calls using seastar's > > > counterparts. if this approach is proved to be workable. the next > > > problem would be to upstream this change. and in a long run, the > > > rocksdb backed bluestore will be replaced by seastore if seastore is > > > capable of supporting relatively slow devices as well. > > > > I've started to look at your rocksdb port. It does look like the > > parts we'd need to adapt are appropriately factored out in rocksdb, > > and I bet we'd get interest from upstream. We might want to take > > their temperature sooner rather than later? We'd also have to perform > > good idea! will do so early tomorrow! > > > essentially the same refactor in Bluestore in order to break the > > bluestore logic apart from the IO/blocking/locking portions. I guess > > this exists in some form with the BlockDevice interface, but we'll > > also have to introduce something like rocksdb's lock replacement. > > This path would get us a much more cooperative (probably more > > performant as well, particularly in high density hosts) bluestore in > > the long run, so it might be worth the work. > > thanks. your insights are inspiring! the test of `env_seastar_test` passed, so it kinda works. and i also wrote a post in https://www.facebook.com/groups/rocksdb.dev/ to get the opinions from the upstream community. > > > > > > - seastore: a completely rewritten object store backend targeting fast > > > NVMe devices. but it will take longer to get there. > > > > I think we're going to do this no matter what. I think > > alien/bluestore choice is about how we want to test crimson prior to > > developing seastore and possibly for handling devices inappropriate > > for seastore? > > that's also my impression. the way how i see it is just because we > haven't started scoping it or had a low level design. > > > -Sam > > > > -- > Regards > Kefu Chai -- Regards Kefu Chai _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx