Le 21/03/2018 à 11:48, Ronny Aasen a écrit :
On 21. mars 2018 11:27, Hervé Ballans wrote:
Hi all,
I have a question regarding a possible scenario to put both wal and
db in a separate SSD device for an OSD node composed by 22 OSDs (HDD
SAS 10k 1,8 To).
I'm thinking of 2 options (at about the same price) :
- add 2 SSD SAS Write Intensive (10DWPD)
- or add a unique SSD NVMe 800 Go (it's the minimum capacity
currently on the market !..)
In both case, that's a lot of partitions on each SSD disk, especially
on the second solution where we would have 44 partitions (22 WAL and
22 DB) !
Is this solution workable (I mean in term of i/o speeds), or is it
unsafe despite the high PCIe bus transfer rate ?
I just want to talk here about throughput performances, not data
integrity on the node in case of SSD crashes...
Thanks in advance for your advices,
if you put the wal and db on the same device anyway, there is no real
benefit to having a partition for each. the reason you can split them
up is if you have them on different devices. Eg db on ssd, but wal on
nvram. it is easier to just colocat wal and db into the same partition
since they live on the same device in your case anyway.
if you have too many osd's db's on the same ssd, you may end up with
the ssd beeing the bottleneck. 4 osd's db's on a ssd have been a
"golden rule" on the mailinglist for a while. for nvram you can
possibly have some more.
but the bottleneck is only one part of the problem. when the 22
partitions db nvram dies, it brings down 22 osd's at once and will be
a huge pain on your cluster. (depending on how large it is...)
i would spread the db's on more devices to reduce the bottleneck and
failure domains in this situation.
Hi Ronny,
Thank you for your clear answer.
OK for putting both wal and db on the same partition, I didn't have this
information, but indeed it seems more interesting in my case (in
particular if I choose the fastest device, i.e. NVMe*)
I plan to have 6 OSDs nodes (same configuration for each) but I don't
know yet if I will use replication (x3) or Erasure Coding (4+2 ?) pools.
Also in both cases, I could eventually accept the loss of a node on a
reduced time (replacement of the journals disk + OSDs reconfiguration).
But you're right, I will start on a configuration where I will spread
the db's on at least 2 fast disks.
Regards,
Hervé
* Just for information, I look closely at the SAMSUNG PM1725 NVMe PCIe
SSD. The (theorical) technical specifications seem interesting,
especilly on the IOPS : up to 750K IOPS for Random Read and 120K IOPS
for Random Write...
kind regards
Ronny Aasen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com