Re: Separate BlueStore WAL/DB : best scenario ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le 21/03/2018 à 11:48, Ronny Aasen a écrit :
On 21. mars 2018 11:27, Hervé Ballans wrote:
Hi all,

I have a question regarding a possible scenario to put both wal and db in a separate SSD device for an OSD node composed by 22 OSDs (HDD SAS 10k 1,8 To).

I'm thinking of 2 options (at about the same price) :

- add 2 SSD SAS Write Intensive (10DWPD)

- or add a unique SSD NVMe 800 Go (it's the minimum capacity currently on the market !..)

In both case, that's a lot of partitions on each SSD disk, especially on the second solution where we would have 44 partitions (22 WAL and 22 DB) !

Is this solution workable (I mean in term of i/o speeds), or is it unsafe despite the high PCIe bus transfer rate ?

I just want to talk here about throughput performances, not data integrity on the node in case of SSD crashes...

Thanks in advance for your advices,


if you put the wal and db on the same device anyway, there is no real benefit to having a partition for each. the reason you can split them up is if you have them on different devices. Eg db on ssd, but wal on nvram. it is easier to just colocat wal and db into the same partition since they live on the same device in your case anyway.

if you have too many osd's db's on the same ssd, you may end up with the ssd beeing the bottleneck. 4 osd's db's on a ssd have been a "golden rule" on the mailinglist for a while. for nvram you can possibly have some more.

but the bottleneck is only one part of the problem. when the 22 partitions db nvram dies, it brings down 22 osd's at once and will be a huge pain on your cluster. (depending on how large it is...) i would spread the db's on more devices to reduce the bottleneck and failure domains in this situation.

Hi Ronny,

Thank you for your clear answer.
OK for putting both wal and db on the same partition, I didn't have this information, but indeed it seems more interesting in my case (in particular if I choose the fastest device, i.e. NVMe*)

I plan to have 6 OSDs nodes (same configuration for each) but I don't know yet if I will use replication (x3) or Erasure Coding (4+2 ?) pools. Also in both cases, I could eventually accept the loss of a node on a reduced time (replacement of the journals disk + OSDs reconfiguration).

But you're right, I will start on a configuration where I will spread the db's on at least 2 fast disks.

Regards,
Hervé
* Just for information, I look closely at the SAMSUNG PM1725 NVMe PCIe SSD. The (theorical) technical specifications seem interesting, especilly on the IOPS : up to 750K IOPS for Random Read and 120K IOPS for Random Write...


kind regards
Ronny Aasen

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux