Re: Hardware for new OSD nodes.

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Thu, 22 Oct 2020 09:05:16 -0700

> Also, any thoughts/recommendations on 12TB OSD drives?  For price/capacity this is a good size for us

Last I checked HDD prices seemed linear from 10-16TB.  Remember to include the cost of the drive bay, ie. the cost of the chassis, the RU(s) it takes up, power, switch ports etc. 

I’ll guess you’re talking LFF HDDs here and a 2U server?   You also don’t tell us how many nodes total, which affects blast radius decisions.

>  has the option of 4 x U.2 NVMe bays - each with 4 PCIe lanes, (and 8 SAS bays)

Think about what that would do to your total $/TB, including the chassis, CPU, switch ports, rack space, etc.  Check if those bays are NVMe-only, or if they are tri-mode.

If you do go with NVMe for WAL+DB, 4 drives is overkill.  Performance-wise, assuming you use a quality NVMe drive and not some consumer-grade crap, you’re going to see sharply diminishing returns after just 1.  Or you could mirror 2 as someone else describes.

But really, consider the hassles of maintaining partitions and mapping as drives fail.  When an HDD fails and you need to re-use its metadata partition for the replacement OSD, you have to be very careful when using a shared device that you re-use the original.  Honestly, depending on your use-case, consider whether using 24xSFF SATA SSDs might not be cost-competitive, factoring in hassle, the time you’ll spend waiting for HDDs to do backfill, etc.  With careful choices, and again depending on your undisclosed use-case, all-NVMe can with careful choices also be surprisingly cost-effective.  If your data is cold, QLC is an option.  With system vendors still pushing expensive RAID HBAs (they must have high margins), you could easly save $600 per chassis just by not having one.  Not having to monitor and deal with the BBU/supercap, etc.

>  but I'm wondering if my BlueFS spillovers are resulting from using drives that are too big.  I also thought I might have seen some comments about cutting large drives into multiple OSDs - could that be?

Don’t cut anything less than an NVMe drive into more than one OSD.  HDD seeks and IOPs are its bottlenecks and slicing/dicing isn’t going to work any magic.

imho,ymmv

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx