On 05/01/2015 07:33 PM, Sage Weil wrote:
Ok, I think I figured out what was going on. The db->submit_transaction()
call (from _txc_finish_io) was blocking when there was a
submit_transaction_sync() in progress. This was making me hit a ceiling
of about 80 iops on my slow disk. When I moved that into _kv_sync_thread
(just prior to the submit_transaction_sync() call) it jumps up to 300+
iops.
I pushed that to wip-newstore.
Further, if I drop the O_DSYNC, it goes up another 50% or so. It'll take
a bit more coding to effectively batch the (implicit) fdatasync from the
O_DSYNC up, though, and capture some of that. Next!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ran through a bunch of tests on 0c728ccc over the weekend:
http://nhm.ceph.com/newstore/5d96fe6f_vs_0c728ccc.pdf
The good news is that sequential writes on spinning disks are looking
significantly better! We went from 40x slower than filestore for small
sequential IO to only about 30-40% slower and we become faster than
filestore at 64kb+ IO sizes.
128kb-2MB sequential writes with data on spinning disk and rocksdb on
SSD regressed. Newstore is no longer really any faster than filestore
for those IO sizes. We saw something similar for random IO, where
spinning disk only results improved and spinning disk + rocksdb on SSD
regressed.
With everything on SSD, we saw small sequential writes improve and
nearly all random writes regress. Not sure how much these regressions
are due to 0c728ccc vs other commits yet.
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html