> Oh, well what I was going to do was just use SATA HBAs on PowerEdge R740s because we don't really care about performance That is important context. > as this is just used as a copy point for backups/archival but the current Ceph cluster we have [Which is based on HDDs attached to Dell RAID controllers with each disk in RAID-0 and works just fine for us] The H330? You can set passthrough / JBOD / HBA personality and avoid the RAID0 dance. > is on EL7 and that is going to be EOL soon. So I thought it might be better on the new cluster to use HBAs instead of having the OSDs just be single disk RAID-0 volumes because I am pretty sure that's the least good scenario whether or not it has been working for us for like 8 years now. See above. > So I asked on the list for recommendations and also read on the website and it really sounds like the only "right way" to run Ceph is by directly attaching disks to a motherboard That isn’t quite what I meant. If one is specking out *new* hardware: * HDDs are a false economy * SATA / SAS SSDs hobble performance for little or no cost savings over NVMe * RAID HBAs are fussy and a waste of money in 2023 > I had thought that HBAs were okay before By HBA I suspect you mean a non-RAID HBA? > but I am probably confusing that with ZFS/BSD or some other equally hyperspecific requirement. ZFS indeed prefers as little as possible between it and the drives. The benefits for Ceph are not identical but very congruent. > The other note was about how using NVMe seems to be the only right way now too. If we predicate that HDDs are a dead end, then that leaves us with SAS/SATA SSD vs NVMe SSD. SAS is all but dead, and carries a price penalty. SATA SSDs are steadily declining in the market. 5-10 years from now I suspect that no more than one manufacturer of enterprise-class SATA SSDs will remain. The future is PCI. SATA SSDs don’t save any money over NVMe SSDs, and additionally require some sort of HBA, be it an add-in card or on the motherboard. SATA and NVMe SSDs use the same NAND, just with a different interface. > I would've rather just stuck to SATA but I figured if I was going to have to buy all new servers that direct attach the SATA ports right off the motherboards to a backplane On-board SATA chips may be relatively weak but I don’t know much about current implementations. > I may as well do it with NVMe (even though the price of the media will be a lot higher). NVMe SSDs shouldn’t cost significantly more than SATA SSDs. Hint: certain tier-one chassis manufacturers mark both the fsck up. You can get a better warranty and pricing by buying drives from a VAR. > It would be cool if someone made NVMe drives that were cost competitive and had similar performance to hard drives (meaning, not super expensive but not lightning fast either) because the $/GB on datacenter NVMe drives like Kioxia, etc is still pretty far away from what it is for HDDs (obviously). It’s a trap! Which is to say, that the $/GB really isn’t far away, and in fact once you step back to TCO from the unit economics of the drive in insolation, the HDDs often turn out to be *more* expensive. Pore through this: https://www.snia.org/forums/cmsi/programs/TCOcalc * $/IOPS are higher for any HDD compared to NAND * HDDs are available up to what, 22TB these days? With the same tired SATA interface as when they were 2TB. That’s rather a bottleneck. We see HDD clusters limiting themselves to 8-10TB HDDs all the time; in fact AIUI RHCS stipulates no larger than 10TB. Feed that into the equation and the TCO changes a bunch * HDDs not only hobble steady-state performance, but under duress — expansion, component failure, etc., the impact to client operations will be higher and recovery to desired redundancy will be much longer. I’ve seen a cluster — especially when using EC — take *4 weeks* to weight an 8TB HDD OSD up or down. Consider the operational cost and risk of that. The SNIA calc has a performance multiplier that accounts for this. * A SATA chassis is stuck with SATA, 5-10 years from now that will be increasingly limiting, especially if you go with LFF drives * RUs cost money. A 1U LFF server can hold what, at most 88TB raw when using HDDs? With 60TB SSDs (*) one can fit 600TB of raw space into the same RU. * If they meet your needs > > Anyway thanks. > -Drew > > > > > > -----Original Message----- > From: Robin H. Johnson <robbat2@xxxxxxxxxx> > Sent: Sunday, January 14, 2024 5:00 PM > To: ceph-users@xxxxxxx > Subject: Re: recommendation for barebones server with 8-12 direct attach NVMe? > > On Fri, Jan 12, 2024 at 02:32:12PM +0000, Drew Weaver wrote: >> Hello, >> >> So we were going to replace a Ceph cluster with some hardware we had >> laying around using SATA HBAs but I was told that the only right way >> to build Ceph in 2023 is with direct attach NVMe. >> >> Does anyone have any recommendation for a 1U barebones server (we just >> drop in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct >> attached to the motherboard without a bridge or HBA for Ceph >> specifically? > If you're buying new, Supermicro would be my first choice for vendor based on experience. > https://www.supermicro.com/en/products/nvme > > You said 2.5" bays, which makes me think you have existing drives. > There are models to fit that, but if you're also considering new drives, you can get further density in E1/E3 > > The only caveat is that you will absolutely want to put a better NIC in these systems, because 2x10G is easy to saturate with a pile of NVME. > > -- > Robin Hugh Johnson > Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer > E-Mail : robbat2@xxxxxxxxxx > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx