On Mon, 4 May 2015, Mark Nelson wrote: > On 05/01/2015 07:33 PM, Sage Weil wrote: > > Ok, I think I figured out what was going on. The db->submit_transaction() > > call (from _txc_finish_io) was blocking when there was a > > submit_transaction_sync() in progress. This was making me hit a ceiling > > of about 80 iops on my slow disk. When I moved that into _kv_sync_thread > > (just prior to the submit_transaction_sync() call) it jumps up to 300+ > > iops. > > > > I pushed that to wip-newstore. > > > > Further, if I drop the O_DSYNC, it goes up another 50% or so. It'll take > > a bit more coding to effectively batch the (implicit) fdatasync from the > > O_DSYNC up, though, and capture some of that. Next! > > > > sage > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > Ran through a bunch of tests on 0c728ccc over the weekend: > > http://nhm.ceph.com/newstore/5d96fe6f_vs_0c728ccc.pdf > > The good news is that sequential writes on spinning disks are looking > significantly better! We went from 40x slower than filestore for small > sequential IO to only about 30-40% slower and we become faster than filestore > at 64kb+ IO sizes. > > 128kb-2MB sequential writes with data on spinning disk and rocksdb on SSD > regressed. Newstore is no longer really any faster than filestore for those > IO sizes. We saw something similar for random IO, where spinning disk only > results improved and spinning disk + rocksdb on SSD regressed. > > With everything on SSD, we saw small sequential writes improve and nearly all > random writes regress. Not sure how much these regressions are due to > 0c728ccc vs other commits yet. That's surprising! I pushed a commit that makes this tunable, newstore sync submit transaction = false (default) Can you see if setting that to true (effectively reverting my last change) fixes the ssd regression? It may also be that this is a simple locking issue that we can fix in rocksdb. Again, the behavior I saw was that the db->submit_transaction() call would block until the sync commit (from kv_sync_thread) finished. I would expect rocksdb to be more careful about that, so maybe there is something else funny/subtle going on. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html