Re: Using bluestore in Jewel 10.0.4

Mark Nelson <mnelson@xxxxxxxxxx> · Mon, 14 Mar 2016 12:15:51 -0500

Earlier on during newstore/bluestore development we tested with the 
rocksdb instance (and just the rocksdb WAL) on SSDs.  At the time it did 
help, but bluestore performance has improved dramatically since then so 
we'll need to retest.  SSDs shouldn't really help with large writes 
anymore (bluestore is already avoiding the journal/wal write penalty for 
large writes!).  I suspect putting rocksdb on the SSD might still help 
with small reads and writes to an extent, especially with large numbers 
of objects.

We haven't done any testing with bcache/dm-cache with bluestore yet, 
though Ben England has started working on looking at dm-cache with 
filestore and I imagine will at some point look at bluestore as well. 
It will be interesting to see how things work out!

Mark

On 03/14/2016 12:00 PM, Stillwell, Bryan wrote:
Mark,

Since most of us already have existing clusters that use SSDs for
journals, has there been any testing of converting that hardware over to
using BlueStore and re-purposing the SSDs as a block cache (like using
bcache)?

To me this seems like it would be a good combination for a typical RBD
cluster.

Thanks,
Bryan

On 3/14/16, 10:52 AM, "ceph-users on behalf of Mark Nelson"
<ceph-users-bounces@xxxxxxxxxxxxxx on behalf of mnelson@xxxxxxxxxx> wrote:

Hi Folks,

We are actually in the middle of doing some bluestore testing/tuning for
the upstream jewel release as we speak. :)  These are (so far) pure HDD
tests using 4 nodes with 4 spinning disks and no SSDs.

Basically on the write side it's looking fantastic and that's an area we
really wanted to improve so that's great.  On the read side, we are
working on getting sequential read performance up for certain IO sizes.
  We are more dependent on client-side readahead with bluestore since
there is no underlying filesystem below the OSDs helping us out. This
usually isn't a problem in practice since there should be readahead on
the VM, but when testing with fio using the RBD engine you should
probably enable client side RBD readahead:

rbd readahead disable after bytes = 0
rbd readahead max bytes = 4194304

Again, this probably only matters when directly using librbd.

The other question is using default buffered reads in bluestore, ie
setting:

"bluestore default buffered read = true"

That's what we are working on testing now.  I've included the ceph.conf
used for these tests and also a link for some of our recent results.
Please download it and open it up in libreoffice as google's preview
isn't showing the graphs.

Here's how the legend is setup:

Hammer-FS: Hammer + Filestore
6dba7fd-BS (No RBD RA): Master + Fixes + Bluestore
6dba7fd-BS: (4M RBD RA): Master + Fixes + Bluestore + 4M RBD Read Ahead
c1e41afb-FS: Master + Filestore + new journal throttling + Sam's tuning

https://drive.google.com/file/d/0B2gTBZrkrnpZMl9OZ18yS3NuZEU/view?usp=shar
ing

Mark

On 03/14/2016 11:04 AM, Kenneth Waegeman wrote:
Hi Stefan,

We are also interested in the bluestore, but did not yet look into it.

We tried keyvaluestore before and that could be enabled by setting the
osd objectstore value.
And in this ticket http://tracker.ceph.com/issues/13942 I see:

[global]
          enable experimental unrecoverable data corrupting features = *
          bluestore fsck on mount = true
          bluestore block db size = 67108864
          bluestore block wal size = 134217728
          bluestore block size = 5368709120
          osd objectstore = bluestore

So I guess this could work for bluestore too.

Very curious to hear what you see stability and performance wise :)

Cheers,
Kenneth

On 14/03/16 16:03, Stefan Lissmats wrote:
Hello everyone!

I think that the new bluestore sounds great and would like to try it
out in my test environment but I didn't find anything how to use it
but I finally managed to test it and it really looks promising
performancewise.
If anyone has more information or guides for bluestore please tell me
where.

I thought I would share how I managed to get a new Jewel cluster with
bluestore based osd:s to work.

What i found so far is that ceph-disk can create new bluestore osd:s
(but not ceph-deploy, plase correct me if i'm wrong) and I need to
have "enable experimental unrecoverable data corrupting features =
bluestore rocksdb" in global section in ceph.conf.
After that I can create new osd:s with ceph-disk prepare --bluestore
/dev/sdg

So i created a cluster with ceph-deploy without any osd:s and then
used ceph-disk on hosts to create the osd:s.

Pretty simple in the end but it took me a while to figure that out.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com