Re: Optimise Setup with Bluestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mehmet!

On 08/16/2017 11:12 AM, Mehmet wrote:
:( no suggestions or recommendations on this?

Am 14. August 2017 16:50:15 MESZ schrieb Mehmet <ceph@xxxxxxxxxx>:

    Hi friends,

    my actual hardware setup per OSD-node is as follow:

    # 3 OSD-Nodes with
    - 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no
    Hyper-Threading
    - 64GB RAM
    - 12x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
    - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for
    12 Disks (20G Journal size)
    - 1x Samsung SSD 840/850 Pro only for the OS

    # and 1x OSD Node with
    - 1x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (10 Cores 20 Threads)
    - 64GB RAM
    - 23x 2TB TOSHIBA MK2001TRKB SAS2 (6GB/s) Disks as OSDs
    - 1x SEAGATE ST32000445SS SAS2 (6GB/s) Disk as OSDs
    - 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for
    24 Disks (15G Journal size)
    - 1x Samsung SSD 850 Pro only for the OS

The single P3700 for 23 spinning disks is pushing it. They have high write durability but based on the model that is the 400GB version? If you are doing a lot of writes you might wear it out pretty fast and it's a single point of failure for the entire node (if it dies you have a lot of data dying with it). General unbalanced setups like this are trickier to get performing well as well.


    As you can see, i am using 1 (one) NVMe (Intel DC P3700 NVMe – 400G)
    Device for whole Spinning Disks (partitioned) on each OSD-node.

    When „Luminous“ is available (as next LTE) i plan to switch vom
    „filestore“ to „bluestore“ 😊

    As far as i have read bluestore consists of
    - „the device“
    - „block-DB“: device that store RocksDB metadata
    - „block-WAL“: device that stores RocksDB „write-ahead journal“

    Which setup would be usefull in my case?
    I Would setup the disks via "ceph-deploy".

So typically we recommend something like a 1-2GB WAL partition on the NVMe drive per OSD and use the remaining space for DB. If you run out of DB space, bluestore will start using the spinning disks to store KV data instead. I suspect this will still be the advice you will want to follow, though at some point having so many WAL and DB partitions on the NVMe may start becoming a bottleneck. Something like 63K sequential writes to heavily fragmented objects might be worth testing, but in most cases I suspect DB and WAL on NVMe is still going to be faster.


    Thanks in advance for your suggestions!
    - Mehmet
    ------------------------------------------------------------------------

    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux