Re: BlueStore questions about workflow and performance

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Tue, 03 Oct 2017 13:48:28 +0000

Hi Mark, great to hear from you!

On Tue, Oct 3, 2017 at 9:16 AM Mark Nelson <mnelson@xxxxxxxxxx> wrote:

On 10/03/2017 07:59 AM, Alex Gorbachev wrote:

> Hi Sam,

>

> On Mon, Oct 2, 2017 at 6:01 PM Sam Huracan <nowitzki.sammy@xxxxxxxxx

> <mailto:nowitzki.sammy@xxxxxxxxx>> wrote:

>

>     Anyone can help me?

>

>     On Oct 2, 2017 17:56, "Sam Huracan" <nowitzki.sammy@xxxxxxxxx

>     <mailto:nowitzki.sammy@xxxxxxxxx>> wrote:

>

>         Hi,

>

>         I'm reading this document:

>          http://storageconference.us/2017/Presentations/CephObjectStore-slides.pdf

>

>         I have 3 questions:

>

>         1. BlueStore writes both data (to raw block device) and metadata

>         (to RockDB) simultaneously, or sequentially?

>

>         2. From my opinion, performance of BlueStore can not compare to

>         FileStore using SSD Journal, because performance of raw disk is

>         less than using buffer. (this is buffer purpose). How do you think?

>

>         3.  Do setting Rock DB and Rock DB Wal in SSD only enhance

>         write, read performance? or both?

>

>         Hope your answer,

>

>

> I am researching the same thing, but recommend you look

> at http://ceph.com/community/new-luminous-bluestore

>

> And also search for Bluestore cache to answer some questions.  My test

> Luminous cluster so far is not as performant as I would like, but I have

> not yet put a serious effort into tuning it, amd it does seem stable.

>

> Hth, Alex

Hi Alex,

If you see anything specific please let us know.  There are a couple of

corner cases where bluestore is likely to be slower than filestore

(specifically small sequential reads/writes with no client side cache or

read ahead).  I've also seen some cases where filestore has higher read

throughput potential (4MB seq reads with multiple NVMe drives per OSD

node).  In many other cases bluestore is faster (and sometimes much

faster) than filestore in our tests.  Writes in general tend to be

faster and high volume object creation is much faster with much lower

tail latencies (filestore really suffers in this test due to PG splitting).

I have two pretty well tuned filestore Jewel clusters running SATA HDDs on dedicated hardware.  For the Luminous cluster, I wanted to do a POC on a VMWare fully meshed (trendy moniker: hyperconverged) setup, using only SSDs, Luminous and Bluestore.  Our workloads are unusual in that RBDs are exported via iSCSI or NFS back to VMWare and consumed by e.g. Windows VMs (we support heathcare and corporate business systems), or Linux VMs direct from Ceph.

What I did so far is dedicate a hardware JBOD with an Areca HBA (you turned me on to those a few years ago :) to each OSD VM. Using 6 Smartstorage SSD OSDs per each OSD VM with 3 of these VMs total and 2x 20 Gb shared network uplinks, I am getting about a third of performance of my hardware Jewel cluster with 24 Lenovo enterprise SATA drives, measured as 4k block reads and writes in single and 32 multiple streams.

Not apples to apples definitely, so I plan to play with Bluestore cache.  One question: does Bluestore distinguish between SSD and HDD based on CRUSH class assignment?

I will check the effect of giving a lot of RAM and CPU cores to OSD VMs, as well as increasing spindles and using different JBODs. 

Thank you for reaching out.  

Regards,
Alex

Mark

>

>

>

>

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

> --

> --

> Alex Gorbachev

> Storcium

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
--Alex Gorbachev
Storcium
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com