Re: Looking for the best way to utilize 1TB NVMe added to the host with 8x3TB HDD OSDs

Ashley Merrick <singapore@xxxxxxxxxxxxxx> · Fri, 20 Sep 2019 15:09:10 +0800

Placing it as a Journal / Bluestore DB/WAL will help with writes mostly, by the sounds of it you want to increase read performance?, how important is the data on this CEPH cluster?

If you place it as a Journal DB/WAL any failure of it will cause total data loss so I would very much advise against this unless this is totally for testing and total data loss is not an issue.

In that can is worth upgrading to blue store by rebuilding each OSD placing the DB/WAL on a SSD partition, you can do this one OSD at a time but there is no migration path so you would need to wait for data rebuilding after each OSD change before moving onto the next.

If you need to make sure your data is safe then your really limited to using it as a read only cache, but I think even then most setups would cause all OSD's to go offline till you manually removed it from a read only cache if the disk failed.
However bcache/dm-cache may support this automatically however is still a risk that I personally wouldn't want to take.

Also it really depends on your use for CEPH and the I/O activity expected to what the best option may be.

---- On Fri, 20 Sep 2019 14:56:12 +0800 Wladimir Mutel <mwg@xxxxxxxxx> wrote ----

    Dear everyone, 

    Last year I set up an experimental Ceph cluster (still single node, 
failure domain = osd, MB Asus P10S-M WS, CPU Xeon E3-1235L, RAM 64 GB, 
HDDs WD30EFRX, Ubuntu 18.04, now with kernel 5.3.0 from Ubuntu mainline 
PPA and Ceph 14.2.4 from download.ceph.com/debian-nautilus/dists/bionic 
). I set up JErasure 2+1 pool, created some RBDs using that as data pool 
and exported them by iSCSI (using tcmu-runner, gwcli and associated 
packages). But with HDD-only setup their performance was less than 
stellar, not saturating even 1Gbit Ethernet on RBD reads. 

    This year my experiment was funded with Gigabyte PCIe NVMe 1TB SSD 
(GP-ASACNE2100TTTDR). Now it is plugged in the MB and is visible as a 
storage device to lsblk. Also I can see its 4 interrupt queues in 
/proc/interrupts, and its transfer measured by hdparm -t is about 2.3GB/sec. 

    And now I want to ask your advice on how to best include it into this 
already existing setup. Should I allocate it for OSD journals and 
databases ? Is there a way to reconfigure existing OSD in this way 
without destroying and recreating it ? Or are there plans to ease this 
kind of migration ? Can I add it as a write-adsorbing cache to 
individual RBD images ? To individual block devices at the level of 
bcache/dm-cache ? What about speeding up RBD reads ? 

    I would appreciate to read your opinions and recommendations. 
    (just want to warn you that in this situation I don't have financial 
option of going full-SSD) 

    Thank you all in advance for your response 
_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com