On Thu, Aug 1, 2019 at 3:44 PM kefu chai <tchaikov@xxxxxxxxx> wrote: > > On Mon, Jul 29, 2019 at 9:19 PM kefu chai <tchaikov@xxxxxxxxx> wrote: > > > > On Sat, Jul 27, 2019 at 5:26 AM Sam Just <sjust@xxxxxxxxxx> wrote: > > > > > > > - to run bluestore in the same process of crimson-osd, but we will > > > > allocate some dedicated threads (and CPU cores) to it. we could use > > > > ceph::thread::ThreadPool for this purpose. for instance, we will have > > > > 3 ConfigProxy backends. > > > > 1. the classic ConfigProxy used by classic OSD and other daemons and > > > > command line utilities. the ConfigProxy normally resides in a global > > > > CephContext. > > > > 2. the ceph::common::ConfigProxy solely used by crimson OSD. it is > > > > rewritten using seastar. it's a sharded service. normally we just > > > > access the config proxy directly in crimson, like > > > > 'local_conf().get_val< uint64_t>("name")' instead of using something > > > > like 'cct->_conf.get_val<uint64_t>("name")' > > > > 3. the ConfigProxy used by bluestore living in the alien world. its > > > > interface will be exactly the same as the classic one, but it will > > > > call into its crimson counterpart using the `seastar::alien::submit()` > > > > call. > > > > > > I'm not sure this is quite right. I think that the seastar config > > > would have a reference over to the alien config machinery in order to > > > inject config changes and do the initial setup, but the alien side > > > needn't have a reference to the crimson one. > > > > i was thinking about the implementation of ConfigProxy::get_val<>(). > > but yeah, if we 1) have a separated copy of ConfigValue on the alien > > side, 2) let the alien side work in the passive mode, and 3) use the > > ThreadPool::submit() to inject config changes into alien's > > ConfigProxy, what'd be a lot easier. > > > > > > > > > in addition to WITH_SEASTAR macro, we can introduce yet another > > > > macro allowing us to call into the facilities offered by > > > > crimson-common. and we can use inline namespace to differentiate the > > > > 2nd from 3rd implementations. as they will need to be co-located in > > > > the same process. and without using different names, we'd violate ODR. > > > > - to hide bluestore in a library which links against ceph-common > > > > library. but the libblustore won't expose any ceph-common symbols to > > > > crimson-osd. but we need to figure out how to maintain the internal > > > > status of ceph-common. as it not quite self-contained in the sense > > > > that it need to access the logging, config and other facilities > > > > offered by crimson-osd. > > > > > > The library option seems promising to me if we go this direction. It > > > can even export an interface which is entirely agnostic of the config > > > machinery (maybe take a serialized representation of the config > > > values?) and write to a different log file at first. > > > > yeah, probably we just need a "keyhole" for updating the alien side's > > config settings. this option actually is a variant of the previous > > one. the only difference is that, we need to use different namespaces > > to differentiate the symbols in bluestore from those in > > crimson-common. > > > > > > > > > - to port rocksdb to seastar: to be specific, this approach will use > > > > seastar's green thread to implement the Mutex, CondVar and Thread in > > > > rocksdb, and implement all blocking calls using seastar's > > > > counterparts. if this approach is proved to be workable. the next > > > > problem would be to upstream this change. and in a long run, the > > > > rocksdb backed bluestore will be replaced by seastore if seastore is > > > > capable of supporting relatively slow devices as well. > > > > > > I've started to look at your rocksdb port. It does look like the > > > parts we'd need to adapt are appropriately factored out in rocksdb, > > > and I bet we'd get interest from upstream. We might want to take > > > their temperature sooner rather than later? We'd also have to perform > > > > good idea! will do so early tomorrow! > > > > > essentially the same refactor in Bluestore in order to break the > > > bluestore logic apart from the IO/blocking/locking portions. I guess > > > this exists in some form with the BlockDevice interface, but we'll > > > also have to introduce something like rocksdb's lock replacement. > > > This path would get us a much more cooperative (probably more > > > performant as well, particularly in high density hosts) bluestore in > > > the long run, so it might be worth the work. > > > > thanks. your insights are inspiring! > > the test of `env_seastar_test` passed, so it kinda works. and i also > wrote a post in https://www.facebook.com/groups/rocksdb.dev/ to get > the opinions from the upstream community. > just a quick update. i was testing the seastar port of rocksdb, its performance does not look promising in comparison with that of classic rocksdb: db_bench --benchmarks="fillseq" seastar rocksdb: fillseq : 390.297 micros/op 2562 ops/sec; 0.3 MB/s classic fillseq : 30.836 micros/op 32429 ops/sec; 3.6 MB/s i will try to understand this discrepancy. if it turns out to be a dead end, we will have to focus on one of the options above unless we have a concrete plan of seastore. > > > > > > > > > - seastore: a completely rewritten object store backend targeting fast > > > > NVMe devices. but it will take longer to get there. > > > > > > I think we're going to do this no matter what. I think > > > alien/bluestore choice is about how we want to test crimson prior to > > > developing seastore and possibly for handling devices inappropriate > > > for seastore? > > > > that's also my impression. the way how i see it is just because we > > haven't started scoping it or had a low level design. > > > > > -Sam > > > > > > > > -- > > Regards > > Kefu Chai > > > > -- > Regards > Kefu Chai -- Regards Kefu Chai _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx