Re: newstore performance update

Mark Nelson <mnelson@xxxxxxxxxx> · Mon, 04 May 2015 12:50:14 -0500

On 05/01/2015 07:33 PM, Sage Weil wrote:
Ok, I think I figured out what was going on.  The db->submit_transaction()
call (from _txc_finish_io) was blocking when there was a
submit_transaction_sync() in progress.  This was making me hit a ceiling
of about 80 iops on my slow disk.  When I moved that into _kv_sync_thread
(just prior to the submit_transaction_sync() call) it jumps up to 300+
iops.

I pushed that to wip-newstore.

Further, if I drop the O_DSYNC, it goes up another 50% or so.  It'll take
a bit more coding to effectively batch the (implicit) fdatasync from the
O_DSYNC up, though, and capture some of that.  Next!

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ran through a bunch of tests on 0c728ccc over the weekend:

http://nhm.ceph.com/newstore/5d96fe6f_vs_0c728ccc.pdf

The good news is that sequential writes on spinning disks are looking 
significantly better!  We went from 40x slower than filestore for small 
sequential IO to only about 30-40% slower and we become faster than 
filestore at 64kb+ IO sizes.

128kb-2MB sequential writes with data on spinning disk and rocksdb on 
SSD regressed.  Newstore is no longer really any faster than filestore 
for those IO sizes.  We saw something similar for random IO, where 
spinning disk only results improved and spinning disk + rocksdb on SSD 
regressed.

With everything on SSD, we saw small sequential writes improve and 
nearly all random writes regress.  Not sure how much these regressions 
are due to 0c728ccc vs other commits yet.

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html