Re: Bluestore disk colocation using NVRAM, SSD and SATA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 09/21/2017 03:19 AM, Maged Mokhtar wrote:
On 2017-09-21 10:01, Dietmar Rieder wrote:

Hi,

I'm in the same situation (NVMEs, SSDs, SAS HDDs). I asked the same
questions to myself.
For now I decided to use the NVMEs as wal and db devices for the SAS
HDDs and on the SSDs I colocate wal and  db.

However, I'm still wonderin how (to what size) and if I should change
the default sizes of wal and db.

Dietmar

On 09/21/2017 01:18 AM, Alejandro Comisario wrote:
But for example, on the same server i have 3 disks technologies to
deploy pools, SSD, SAS and SATA.
The NVME were bought just thinking on the journal for SATA and SAS,
since journals for SSD were colocated.

But now, exactly the same scenario, should i trust the NVME for the SSD
pool ? are there that much of a  gain ? against colocating block.* on
the same SSD?

best.

On Wed, Sep 20, 2017 at 6:36 PM, Nigel Williams
<nigel.williams@xxxxxxxxxxx <mailto:nigel.williams@xxxxxxxxxxx>
<mailto:nigel.williams@xxxxxxxxxxx
<mailto:nigel.williams@xxxxxxxxxxx>>> wrote:

    On 21 September 2017 at 04:53, Maximiliano Venesio
    <massimo@xxxxxxxxxxx <mailto:massimo@xxxxxxxxxxx>
<mailto:massimo@xxxxxxxxxxx <mailto:massimo@xxxxxxxxxxx>>> wrote:

        Hi guys i'm reading different documents about bluestore, and it
        never recommends to use NVRAM to store the bluefs db,
        nevertheless the official documentation says that, is better to
        use the faster device to put the block.db in.


    ​Likely not mentioned since no one yet has had the opportunity to
    test it.​

        So how do i have to deploy using bluestore, regarding where i
        should put block.wal and block.db ?


    ​block.* would be best on your NVRAM device, like this:

    ​ceph-deploy osd create --bluestore c0osd-136:/dev/sda --block-wal
    /dev/nvme0n1 --block-db /dev/nvme0n1



    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
<mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>




--
*Alejandro Comisario*
*CTO | NUBELIU*
E-mail: alejandro@xxxxxxxxxxx <mailto:alejandro@xxxxxxxxxxx>
<mailto:alejandro@xxxxxxxxxxx <mailto:alejandro@xxxxxxxxxxx>>Cell: +54 9
11 3770 1857
_
www.nubeliu.com <http://www.nubeliu.com> <http://www.nubeliu.com/>



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



My guess is for wal: you are dealing with a 2 step io operation so in
case it is collocated on your SSDs your iops for small writes will be
halfed. The decision is if you add a small NVMEs as wal for 4 or 5
(large) SSDs, you will double their iops for small io sized. This is not
the case for db.

For wal size:  512 MB is recommended ( ceph-disk default )

For db size: a "few" GB..probably 10GB is a good number. I guess we will
hear more in the future.

There's a pretty good chance that if you are writing out lots of small RGW or rados objects you'll blow past 10GB of metadata once rocksdb space-amp is factored in. I can pretty routinely do it when writing out millions of rados objects per OSD. Bluestore will switch to write metadata out to the block disk and in this case it might not be that bad of a transition (NVMe to SSD). If you have spare room, you might as well give the DB partition whatever you have available on the device. A harder question is how much fast storage to buy for the WAL/DB. It's not straight forward, and rocksdb can be tuned in various ways to favor reducing space/write/read amplification, but not all 3 at once. Right now we are likely favoring reducing write-amplification over space/read amp, but one could imagine that with a small amount of incredibly fast storage it might be better to favor reducing space-amp.

Mark


Maged Mokhtar




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux