Re: Bluestore performance 50% of filestore

"Milanov, Radoslav Nikiforov" <radonm@xxxxxx> · Thu, 16 Nov 2017 16:11:15 +0000

No,
What test parameters (iodepth/file size/numjobs) would make sense  for 3 node/27OSD@4TB ?
- Rado

-----Original Message-----
From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] 
Sent: Thursday, November 16, 2017 10:56 AM
To: Milanov, Radoslav Nikiforov <radonm@xxxxxx>; David Turner <drakonstein@xxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Bluestore performance 50% of filestore

Did you happen to have a chance to try with a higher io depth?

Mark

On 11/16/2017 09:53 AM, Milanov, Radoslav Nikiforov wrote:
> FYI
>
> Having 50GB bock.db made no difference on the performance.
>
>
>
> - Rado
>
>
>
> *From:*David Turner [mailto:drakonstein@xxxxxxxxx]
> *Sent:* Tuesday, November 14, 2017 6:13 PM
> *To:* Milanov, Radoslav Nikiforov <radonm@xxxxxx>
> *Cc:* Mark Nelson <mnelson@xxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
> *Subject:* Re:  Bluestore performance 50% of filestore
>
>
>
> I'd probably say 50GB to leave some extra space over-provisioned.  
> 50GB should definitely prevent any DB operations from spilling over to the HDD.
>
>
>
> On Tue, Nov 14, 2017, 5:43 PM Milanov, Radoslav Nikiforov 
> <radonm@xxxxxx <mailto:radonm@xxxxxx>> wrote:
>
>     Thank you,
>
>     It is 4TB OSDs and they might become full someday, I’ll try 60GB db
>     partition – this is the max OSD capacity.
>
>
>
>     - Rado
>
>
>
>     *From:*David Turner [mailto:drakonstein@xxxxxxxxx
>     <mailto:drakonstein@xxxxxxxxx>]
>     *Sent:* Tuesday, November 14, 2017 5:38 PM
>
>
>     *To:* Milanov, Radoslav Nikiforov <radonm@xxxxxx 
> <mailto:radonm@xxxxxx>>
>
>     *Cc:*Mark Nelson <mnelson@xxxxxxxxxx <mailto:mnelson@xxxxxxxxxx>>;
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>
>
>     *Subject:* Re:  Bluestore performance 50% of filestore
>
>
>
>     You have to configure the size of the db partition in the config
>     file for the cluster.  If you're db partition is 1GB, then I can all
>     but guarantee that you're using your HDD for your blocks.db very
>     quickly into your testing.  There have been multiple threads
>     recently about what size the db partition should be and it seems to
>     be based on how many objects your OSD is likely to have on it.  The
>     recommendation has been to err on the side of bigger.  If you're
>     running 10TB OSDs and anticipate filling them up, then you probably
>     want closer to an 80GB+ db partition.  That's why I asked how full
>     your cluster was and how large your HDDs are.
>
>
>
>     Here's a link to one of the recent ML threads on this
>     topic.  
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020
> 822.html
>
>     On Tue, Nov 14, 2017 at 4:44 PM Milanov, Radoslav Nikiforov
>     <radonm@xxxxxx <mailto:radonm@xxxxxx>> wrote:
>
>         Block-db partition is the default 1GB (is there a way to modify
>         this? journals are 5GB in filestore case) and usage is low:
>
>
>
>         [root@kumo-ceph02 ~]# ceph df
>
>         GLOBAL:
>
>             SIZE        AVAIL      RAW USED     %RAW USED
>
>             100602G     99146G        1455G          1.45
>
>         POOLS:
>
>             NAME              ID     USED       %USED     MAX AVAIL
>         OBJECTS
>
>             kumo-vms          1      19757M      0.02
>         31147G        5067
>
>             kumo-volumes      2        214G      0.18
>         31147G       55248
>
>             kumo-images       3        203G      0.17
>         31147G       66486
>
>             kumo-vms3         11     45824M      0.04
>         31147G       11643
>
>             kumo-volumes3     13     10837M         0
>         31147G        2724
>
>             kumo-images3      15     82450M      0.09
>         31147G       10320
>
>
>
>         - Rado
>
>
>
>         *From:*David Turner [mailto:drakonstein@xxxxxxxxx
>         <mailto:drakonstein@xxxxxxxxx>]
>         *Sent:* Tuesday, November 14, 2017 4:40 PM
>         *To:* Mark Nelson <mnelson@xxxxxxxxxx <mailto:mnelson@xxxxxxxxxx>>
>         *Cc:* Milanov, Radoslav Nikiforov <radonm@xxxxxx
>         <mailto:radonm@xxxxxx>>; ceph-users@xxxxxxxxxxxxxx
>         <mailto:ceph-users@xxxxxxxxxxxxxx>
>
>
>         *Subject:* Re:  Bluestore performance 50% of 
> filestore
>
>
>
>         How big was your blocks.db partition for each OSD and what size
>         are your HDDs?  Also how full is your cluster?  It's possible
>         that your blocks.db partition wasn't large enough to hold the
>         entire db and it had to spill over onto the HDD which would
>         definitely impact performance.
>
>
>
>         On Tue, Nov 14, 2017 at 4:36 PM Mark Nelson <mnelson@xxxxxxxxxx
>         <mailto:mnelson@xxxxxxxxxx>> wrote:
>
>             How big were the writes in the windows test and how much
>             concurrency was
>             there?
>
>             Historically bluestore does pretty well for us with small
>             random writes
>             so your write results surprise me a bit.  I suspect it's the
>             low queue
>             depth.  Sometimes bluestore does worse with reads, especially if
>             readahead isn't enabled on the client.
>
>             Mark
>
>             On 11/14/2017 03:14 PM, Milanov, Radoslav Nikiforov wrote:
>             > Hi Mark,
>             > Yes RBD is in write back, and the only thing that changed
>             was converting OSDs to bluestore. It is 7200 rpm drives and
>             triple replication. I also get same results (bluestore 2
>             times slower) testing continuous writes on a 40GB partition
>             on a Windows VM, completely different tool.
>             >
>             > Right now I'm going back to filestore for the OSDs so
>             additional tests are possible if that helps.
>             >
>             > - Rado
>             >
>             > -----Original Message-----
>             > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx
>             <mailto:ceph-users-bounces@xxxxxxxxxxxxxx>] On Behalf Of
>             Mark Nelson
>             > Sent: Tuesday, November 14, 2017 4:04 PM
>             > To: ceph-users@xxxxxxxxxxxxxx
>             <mailto:ceph-users@xxxxxxxxxxxxxx>
>             > Subject: Re:  Bluestore performance 50% of
>             filestore
>             >
>             > Hi Radoslav,
>             >
>             > Is RBD cache enabled and in writeback mode?  Do you have
>             client side readahead?
>             >
>             > Both are doing better for writes than you'd expect from
>             the native performance of the disks assuming they are
>             typical 7200RPM drives and you are using 3X replication
>             (~150IOPS * 27 / 3 = ~1350 IOPS).  Given the small file
>             size, I'd expect that you might be getting better journal
>             coalescing in filestore.
>             >
>             > Sadly I imagine you can't do a comparison test at this
>             point, but I'd be curious how it would look if you used
>             libaio with a high iodepth and a much bigger partition to do
>             random writes over.
>             >
>             > Mark
>             >
>             > On 11/14/2017 01:54 PM, Milanov, Radoslav Nikiforov wrote:
>             >> Hi
>             >>
>             >> We have 3 node, 27 OSDs cluster running Luminous 12.2.1
>             >>
>             >> In filestore configuration there are 3 SSDs used for
>             journals of 9
>             >> OSDs on each hosts (1 SSD has 3 journal paritions for 3
>             OSDs).
>             >>
>             >> I've converted filestore to bluestore by wiping 1 host a
>             time and
>             >> waiting for recovery. SSDs now contain block-db - again
>             one SSD
>             >> serving
>             >> 3 OSDs.
>             >>
>             >>
>             >>
>             >> Cluster is used as storage for Openstack.
>             >>
>             >> Running fio on a VM in that Openstack reveals bluestore
>             performance
>             >> almost twice slower than filestore.
>             >>
>             >> fio --name fio_test_file --direct=1 --rw=randwrite
>             --bs=4k --size=1G
>             >> --numjobs=2 --time_based --runtime=180 --group_reporting
>             >>
>             >> fio --name fio_test_file --direct=1 --rw=randread --bs=4k
>             --size=1G
>             >> --numjobs=2 --time_based --runtime=180 --group_reporting
>             >>
>             >>
>             >>
>             >>
>             >>
>             >> Filestore
>             >>
>             >>   write: io=3511.9MB, bw=19978KB/s, iops=4994,
>             runt=180001msec
>             >>
>             >>   write: io=3525.6MB, bw=20057KB/s, iops=5014,
>             runt=180001msec
>             >>
>             >>   write: io=3554.1MB, bw=20222KB/s, iops=5055,
>             runt=180016msec
>             >>
>             >>
>             >>
>             >>   read : io=1995.7MB, bw=11353KB/s, iops=2838,
>             runt=180001msec
>             >>
>             >>   read : io=1824.5MB, bw=10379KB/s, iops=2594,
>             runt=180001msec
>             >>
>             >>   read : io=1966.5MB, bw=11187KB/s, iops=2796,
>             runt=180001msec
>             >>
>             >>
>             >>
>             >> Bluestore
>             >>
>             >>   write: io=1621.2MB, bw=9222.3KB/s, iops=2305,
>             runt=180002msec
>             >>
>             >>   write: io=1576.3MB, bw=8965.6KB/s, iops=2241,
>             runt=180029msec
>             >>
>             >>   write: io=1531.9MB, bw=8714.3KB/s, iops=2178,
>             runt=180001msec
>             >>
>             >>
>             >>
>             >>   read : io=1279.4MB, bw=7276.5KB/s, iops=1819,
>             runt=180006msec
>             >>
>             >>   read : io=773824KB, bw=4298.9KB/s, iops=1074,
>             runt=180010msec
>             >>
>             >>   read : io=1018.5MB, bw=5793.7KB/s, iops=1448,
>             runt=180001msec
>             >>
>             >>
>             >>
>             >>
>             >>
>             >> - Rado
>             >>
>             >>
>             >>
>             >>
>             >>
>             >> _______________________________________________
>             >> ceph-users mailing list
>             >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>             >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>             >>
>             > _______________________________________________
>             > ceph-users mailing list
>             > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>             > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>             >
>             _______________________________________________
>             ceph-users mailing list
>             ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>             http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com