Re: bluestore - wal,db on faster devices?

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 8 Nov 2017 13:45:46 -0600

Hi Wolfgang,

You've got the right idea.  RBD is probably going to benefit less since 
you have a small number of large objects and little extra OMAP data. 
Having the allocation and object metadata on flash certainly shouldn't 
hurt, and you should still have less overhead for small (<64k) writes. 
With RGW however you also have to worry about bucket index updates 
during writes and that's a big potential bottleneck that you don't need 
to worry about with RBD.

Mark

On 11/08/2017 01:01 PM, Wolfgang Lendl wrote:
Hi Mark,

thanks for your reply!
I'm a big fan of keeping things simple - this means that there has to be
a very good reason to put the WAL and DB on a separate device otherwise
I'll keep it collocated (and simpler).

as far as I understood - putting the WAL,DB on a faster (than hdd)
device makes more sense in cephfs and rgw environments (more metadata) -
and less sense in rbd environments - correct?

br
wolfgang

On 11/08/2017 02:21 PM, Mark Nelson wrote:
Hi Wolfgang,

In bluestore the WAL serves sort of a similar purpose to filestore's
journal, but bluestore isn't dependent on it for guaranteeing
durability of large writes.  With bluestore you can often get higher
large-write throughput than with filestore when using HDD-only or
flash-only OSDs.

Bluestore also stores allocation, object, and cluster metadata in the
DB.  That, in combination with the way bluestore stores objects,
dramatically improves behavior during certain workloads.  A big one is
creating millions of small objects as quickly as possible.  In
filestore, PG splitting has a huge impact on performance and tail
latency.  Bluestore is much better just on HDD, and putting the DB and
WAL on flash makes it better still since metadata no longer is a
bottleneck.

Bluestore does have a couple of shortcomings vs filestore currently.
The allocator is not as good as XFS's and can fragment more over time.
There is no server-side readahead so small sequential read performance
is very dependent on client-side readahead.  There's still a number of
optimizations to various things ranging from threading and locking in
the shardedopwq to pglog and dup_ops that potentially could improve
performance.

I have a blog post that we've been working on that explores some of
these things but I'm still waiting on review before I publish it.

Mark

On 11/08/2017 05:53 AM, Wolfgang Lendl wrote:
Hello,

it's clear to me getting a performance gain from putting the journal on
a fast device (ssd,nvme) when using filestore backend.
it's not when it comes to bluestore - are there any resources,
performance test, etc. out there how a fast wal,db device impacts
performance?

br
wolfgang

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com