Re: Choosing suitable SSD for Ceph cluster

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sat, 12 Sep 2020 17:23:08 -0700

Is this a reply to Paul’s message from 11 months ago?

https://bit.ly/32oZGlR

The PM1725b is interesting in that it has explicitly configurable durability vs capacity, which may be even more effective than user-level short-stroking / underprovisioning.

> 
> Hi. How do you say 883DCT is faster than 970 EVO?
> I saw the specifications and 970 EVO has higher IOPS than 883DCT!
> Can you please tell why 970 EVO act lower than 883DCT?

The thread above explains that.  Basically it’s not as simple as “faster”. IOPS describe behavior along one axis under a certain workload for a certain length of times.  Subtle factors:

* With increasing block size, queue depth, operation rate / duration, some less-robust drives will exhibit cliffing where their performance falls off dramatically

——————
                    |
                    |________
—————————————————

(that may or may not render usefully, your UMA may vary)

Or they may lose your data when there’s a power event.

* Is IOPS what you’re really concerned with?  As your OSD nodes are increasingly saturated by parallel requests (or if you’re overly aggressive with your PG ratio) , you may see more IOPS / throughput, at the risk of latencying going down the drain.  This may be reasonably acceptable for RGW bucket data, but maybe not indexes and for sure not for RBD volumes.

* The nature of the workload can dramatically affect performance

** block size
** queue depth
** r/w mix
** sync
** phoon
** etc

This is one thing that (hopefully) distinguishes “enterprise” drives from “consumer” drives.  There’s one “enterprise” drive (now EOL) that turned out to develop UREs and dramatically increased latency when presented with an actual enterprise Ceph — vs desktop — workload. I fought that for a year and found that older drives actually fared better than newer, though the vendor denyed an engineering or process change.  Consider the total cost of saving a few bucks on cheap drives that appear *on paper* to have attractive marketing specs, vs the nightmares you will face and the other things you won’t have time to work on if you’re consumed with pandemic drive failures.

Look up the performance firmware update history of the various 840/860 EVO even when used on desktops, which is not to say that the 970 does or doesn’t exhibit the same or similar issues.  Consider if you want to risk your corporate/production data, applications, and users on desktop-engineered drives.

In the end, you really need to buy or borrow eval drives and measure how they perform under both benchmarks and real workloads.  And Ceph mon / OSD service is *not* the same as any FIO or other benchmark tool load.

https://github.com/louwrentius/fio-plot

is a delightfully visual tool that shows the IOPS / BW / latency tradeoffs

Ideally one would compare FIO benchmarks across drives and also provision multiple models on a given system, slap OSDs on them, throw your real workload at them, and after at least a month gather drive/OSD iops/latency/bw metrics for each and compare them.  I’m not aware of a simple tool to manage this process, though I’d love one.

ymmocv
— aad

> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx