Re: Looking for the best way to utilize 1TB NVMe added to the host with 8x3TB HDD OSDs

Ashley Merrick <singapore@xxxxxxxxxxxxxx> · Sun, 22 Sep 2019 14:29:28 +0800

Correct, in a large cluster no problem.

I was talking in Wladimir setup where they are running single node with a failure domain of OSD. Which would be a loss of all OSD's and all data.

---- On Sun, 22 Sep 2019 03:42:52 +0800 solarflow99 <solarflow99@xxxxxxxxx> wrote ----

now my understanding is that a NVMe drive is recommended to help speed up bluestore.  If it were to fail then those OSDs would be lost but assuming there is 3x replication and enough OSDs I don't see the problem here.  There are other scenarios where a whole server might le lost, it doesn't mean the total loss of the cluster.

On Sat, Sep 21, 2019 at 5:27 AM Ashley Merrick <singapore@xxxxxxxxxxxxxx> wrote:

Placing it as a Journal / Bluestore DB/WAL will help with writes mostly, by the sounds of it you want to increase read performance?, how important is the data on this CEPH cluster?

If you place it as a Journal DB/WAL any failure of it will cause total data loss so I would very much advise against this unless this is totally for testing and total data loss is not an issue.

In that can is worth upgrading to blue store by rebuilding each OSD placing the DB/WAL on a SSD partition, you can do this one OSD at a time but there is no migration path so you would need to wait for data rebuilding after each OSD change before moving onto the next.

If you need to make sure your data is safe then your really limited to using it as a read only cache, but I think even then most setups would cause all OSD's to go offline till you manually removed it from a read only cache if the disk failed.
However bcache/dm-cache may support this automatically however is still a risk that I personally wouldn't want to take.

Also it really depends on your use for CEPH and the I/O activity expected to what the best option may be.

---- On Fri, 20 Sep 2019 14:56:12 +0800 Wladimir Mutel <mwg@xxxxxxxxx> wrote ----

    Dear everyone, 

    Last year I set up an experimental Ceph cluster (still single node, 
failure domain = osd, MB Asus P10S-M WS, CPU Xeon E3-1235L, RAM 64 GB, 
HDDs WD30EFRX, Ubuntu 18.04, now with kernel 5.3.0 from Ubuntu mainline 
PPA and Ceph 14.2.4 from download.ceph.com/debian-nautilus/dists/bionic 
). I set up JErasure 2+1 pool, created some RBDs using that as data pool 
and exported them by iSCSI (using tcmu-runner, gwcli and associated 
packages). But with HDD-only setup their performance was less than 
stellar, not saturating even 1Gbit Ethernet on RBD reads. 

    This year my experiment was funded with Gigabyte PCIe NVMe 1TB SSD 
(GP-ASACNE2100TTTDR). Now it is plugged in the MB and is visible as a 
storage device to lsblk. Also I can see its 4 interrupt queues in 
/proc/interrupts, and its transfer measured by hdparm -t is about 2.3GB/sec. 

    And now I want to ask your advice on how to best include it into this 
already existing setup. Should I allocate it for OSD journals and 
databases ? Is there a way to reconfigure existing OSD in this way 
without destroying and recreating it ? Or are there plans to ease this 
kind of migration ? Can I add it as a write-adsorbing cache to 
individual RBD images ? To individual block devices at the level of 
bcache/dm-cache ? What about speeding up RBD reads ? 

    I would appreciate to read your opinions and recommendations. 
    (just want to warn you that in this situation I don't have financial 
option of going full-SSD) 

    Thank you all in advance for your response 
_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
 ceph-users mailing list
 ceph-users@xxxxxxxxxxxxxx
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com