Re: newstore performance update

Mark Nelson <mnelson@xxxxxxxxxx> · Tue, 05 May 2015 12:43:11 -0500

On 05/04/2015 01:08 PM, Sage Weil wrote:
On Mon, 4 May 2015, Mark Nelson wrote:
On 05/01/2015 07:33 PM, Sage Weil wrote:

Ran through a bunch of tests on 0c728ccc over the weekend:

http://nhm.ceph.com/newstore/5d96fe6f_vs_0c728ccc.pdf

The good news is that sequential writes on spinning disks are looking
significantly better!  We went from 40x slower than filestore for small
sequential IO to only about 30-40% slower and we become faster than filestore
at 64kb+ IO sizes.

128kb-2MB sequential writes with data on spinning disk and rocksdb on SSD
regressed.  Newstore is no longer really any faster than filestore for those
IO sizes.  We saw something similar for random IO, where spinning disk only
results improved and spinning disk + rocksdb on SSD regressed.

With everything on SSD, we saw small sequential writes improve and nearly all
random writes regress.  Not sure how much these regressions are due to
0c728ccc vs other commits yet.

That's surprising!  I pushed a commit that makes this tunable,

  newstore sync submit transaction = false (default)

Can you see if setting that to true (effectively reverting my last change)
fixes the ssd regression?

It may also be that this is a simple locking issue that we can fix in
rocksdb.  Again, the behavior I saw was that the db->submit_transaction()
call would block until the sync commit (from kv_sync_thread) finished.
I would expect rocksdb to be more careful about that, so maybe there is
something else funny/subtle going on.

sage

Ok, ran through new SSD tests and wasn't able to replicate the poor 
random performance from 0c728ccc again.

http://nhm.ceph.com/newstore/sync_submit_transaction.pdf

Haven't dug into the blktrace or collectl data yet to see if there are 
any interesting differences, but I'll try to look at that later if I get 
a bit of free time.

The good news is that sync submit transaction = false seems to make a 
pretty noticeable improvement with 8c8c5903 on an SSD backed newstore 
OSD.  At small IO sizes we appear to be doing better than filestore for 
both random and sequential IO.  Interestingly random writes still appear 
to be faster than sequential writes when everything is on SSD!

It looks like the big remaining issue now is 64kb+ sized writes on SSD.

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html