Re: Building ceph clusters with 8TB SSD drives?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The main reason for SSDs is typically to improve IOPS for small writes, but for that usage most (all) consumer SSDs we have tested perform badly in Ceph.

The reason for this is that Ceph requires SYNC writes, and since consumer SSDs (and now even some cheap datacenter ones) don't have capacitors for power-loss-protection they cannot use the volatile caches that give them (semi-fake) good performance on desktops.

If that sounds bad, you should be even more careful of you shop around until you find a cheap drive that performs well - because there have historically been consumer drives that lie and acknowledge a sync even if the data is just in volatile memory rather than safe :-)

Samsung PM883 is likely one of the cheapest drives that you can still fully trust - at least if your application is not highly write-intensive.

Now, having said that, we have had pretty good experience with a way to partly cheat around these limitations: since we have large servers with mixed HDDs we also have 2-3 NVMe samsung PM983 M.2 drives per server on PCIe cards for the DB/wal. It seems to work remarkably well to do this for consumer SSDs to, I.e. let each 4TB el cheapo SATA SSD (we used Samsung 860) use a ~100GB db/wal partition on an NVMe drive. This gives very nice low latencies in rados benchmarks, although they are still ~50% higher than with proper enterprise SSDs.

Caveats:

- Think about balancing IOPS. If you have 10 SSD OSDs share a single NVMe WAL device you will likely be limited by the NVMe.
- if the NVMe drive dies, all the corresponding OSDs die.
- This might work for read-intensive applications, but if you try it for write-intensive applications you will wear out the consumer SSDs (check their write endurance).
- You will still see latency/bandwidth go up/down and periodically throttle for consumer SSDs.


In comparison, even the relatively cheap pm883 "just works" at constant high bandwidth close to the bus limit, and the latency is a constant low fraction of a millisecond in ceph.

In summary, while somewhat possible, I don't think it's worth the hassle/risk/complex setup with consumer drives, but if I absolutely had to i would at least avoid the absolutely cheapest QVO models - and if you don't put the WAL on a better device I predict you'll regret it once you start doing benchmarks in RADOS.

Cheers,

Erik





_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux