Re: Building ceph clusters with 8TB SSD drives?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matt,

Yes, we've experimented a bit with consumer SSDs, and also done some
benchmarks.

The main reason for SSDs is typically to improve IOPS for small writes,
since even HDDs will usually give you quite good aggregated bandwidth as
long as you have enough of them - but for high-IOPS usage most (all)
consumer SSDs we have tested perform badly in Ceph, in particular for
writes.

The reason for this is that Ceph requires sync writes, and since consumer
SSDs (and now even some cheap datacenter ones) don't have capacitors for
power-loss-protection they cannot use the volatile caches that give them
(semi-fake) good performance on desktops. If that sounds bad, you should be
even more careful if you shop around until you find a cheap drive that
performs well - because there have historically been consumer drives that
lie and acknowledge a sync even if the data is just in volatile memory
rather than safe :-)

Samsung PM883 one relative cheap drives that we've been quite happy with -
at least if your application is not highly write-intensive. If it's
write-intensive you might need the longer-endurace SM883 (or similar from
other vendors).

Now, having said that, we have had pretty decent experience with a way to
partly cheat around these limitations: since we have a good dozen large
servers with mixed HDDs we also have 2-3 NVMe samsung PM983 M.2 drives per
server on PCIe cards for the DB/wal for these OSDs. It seems to work
remarkably well to do this for consumer SSDs to, i.e. let each 4TB el
cheapo SATA SSD (we used Samsung 860) use a ~100GB db/wal partition on an
NVMe drive. This gives very nice low latencies in rados benchmarks,
although they are still ~50% higher than with proper enterprise SSDs.


Caveats:

- Think about balancing IOPS. If you have 10 SSD OSDs share a single NVMe
WAL device you will likely be limited by the NVMe IOPS instead.
- if the NVMe drive dies, all the corresponding OSDs die.
- This might work for read-intensive applications, but if you try it for
write-intensive applications you will wear out the consumer SSDs (check
their write endurance).
- When doing rados benchmarks, you will still see latency/bandwidth go
up/down and periodically throttle to almost zero for consumer SSDs,
presumably because they are busy flushing some sort of intermediate storage.


In comparison, even the relatively cheap pm883 "just works" at constant
high bandwidth close to the bus limit, and the latency is a constant low
fraction of a millisecond in ceph.

In summary, while somewhat possible, I simply don't think it's worth the
hassle/risk/complex setup with consumer drives (and god knows I can be a
cheap bastard at times ;-), but if you absolutely have to i would at least
avoid the absolutely cheapest QVO models (note that the QVO models have a
sustained bandwidth of only 80-160MB/s - that's like a magnetic spinner!) -
and if you don't put the WAL on a better device I predict you'll regret it
once you start doing benchmarks in RADOS.

Cheers,

Erik


On Fri, May 7, 2021 at 10:11 PM Matt Larson <larsonmattr@xxxxxxxxx> wrote:

> Is anyone trying Ceph clusters containing larger (4-8TB) SSD drives?
>
> 8TB SSDs are described here (
>
> https://www.anandtech.com/show/16136/qlc-8tb-ssd-review-samsung-870-qvo-sabrent-rocket-q
> ) and make use QLC NAND flash memory to reach the costs and capacity.
> Currently, the 8TB Samsung 870 SSD is $800/ea at some online retail stores.
>
> SATA form-factor SSDs can reach read/write rates of 560/520 MB/s, while not
> as great as nVME drives is still a multiple faster than 7200 RPM drives.
> SSDs now appear to have much lower failure rates than HDs in 2021 (
>
> https://www.techspot.com/news/89590-backblaze-latest-storage-reliability-figures-add-ssd-boot.html
> ).
>
> Are there any major caveats to considering working with larger SSDs for
> data pools?
>
> Thanks,
>   Matt
>
> --
> Matt Larson, PhD
> Madison, WI  53705 U.S.A.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
Erik Lindahl <erik.lindahl@xxxxxxxxx>
Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
University
Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux