On Wed, Mar 28, 2018 at 9:43 AM, Josh Durgin <jdurgin@xxxxxxxxxx> wrote: > Hi Lisa, your presentation last week at Cephalocon was quite convincing. > > Recordings aren't available yet, so perhaps you can share your slides. Here are the slides: https://drive.google.com/file/d/1WC0id77KWLNVllsEcJCgRgEQ-Xzvzqx8/view?usp=sharing > > For those who weren't there, Lisa tested many configurations of rocksdb > with bluestore to attempt to keep the pg log out of level 0 in rocksdb, > and thus avoid a large source of write amplification. > > None of these tunings were successful, so the conclusion was that the pg > log ought to be stored outside of rocksdb. > > Lisa, what are your thoughts on how to store the pg log? > > For historical reference, it was moved into leveldb originally to make > it easier to program against correctly [0], but the current PGLog code > has grown too complex despite that. I ever wondered whether we can just put pg log in standalone log files. The read performance is not critical as they are read when an OSD node recovers. That is to store other metadata in RocksDB and then store pg log in standalone journal files. (No transaction for other metadata and pg log). But then I noticed that we can't differentiate which OSD has latest data if 3 OSD nodes which contain same pgs fail during a write request. Some OSDs may have updated data, and other OSDs may have un-undated data, which all of these have no pg log appended. In this case, it needs to compare the full objects. Another method I am investigating is that whether in Rocksdb we can use fifo case just for pg log. That means we need to handle for each pg. This needs to update in Rocksdb and every pg log will be written twice at most. (One to Rocksdb log, and one to level 0). Any suggestions? > > Josh > > [0] > https://github.com/ceph/ceph/commit/1ef94200e9bce5e0f0ac5d1e563421a9d036c203 > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best wishes Lisa -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html