Re: recommendation for barebones server with 8-12 direct attach NVMe?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



by “RBD for cloud”, do you mean VM / container general-purposes volumes on which a filesystem is usually built?  Or large archive / backup volumes that are read and written sequentially without much concern for latency or throughput?

How many of those ultra-dense chassis in a cluster?  Are all 60 off a single HBA?

I’ve experienced RGW clusters built from 4x 90 slot ultra-dense chassis, each of which had 2x server trays, so effectively 2x 45 slot chassis bound together.  The bucket pool was EC 3,2 or 4,2.  The motherboard was …. odd, as a certain chassis vendor had a thing for at a certain point in time.  With only 12 DIMM slots each, they were chronically short on RAM and the single HBA was a bottleneck.  Performance was acceptable for the use-case …. at first.  As the cluster filled up and got busier, that was no longer the case.  And these were 8TB capped drives.  Not all slots were filled, at least initially.

The index pool was on separate 1U servers with SATA SSDs.

There were hotspots, usually relatively small objects that clients hammered on.  A single OSD restarting and recovering would tank the API; we found it better to destroy and redeploy it.   Expanding faster than data was coming in was a challenge, as we had to throttle the heck out of the backfill to avoid rampant slow requests and API impact.

QLC with a larger number of OSD node failure domains was a net win in that RAS was dramatically increased, and expensive engineer-hours weren’t soaked up fighting performance and availability issues.  

ymmv, especially if one’s organization has unreasonably restrictive purchasing policies, row after row of empty DC racks, etc.  I’ve suffered LFF spinners — just 3 / 4 TB — misused for  OpenStack Cinder and Glance.  Filestore with (wince) colocated journals * with 3R pools — EC for RBD was not yet a thing, else we would have been forced to make it even worse.  The stated goal of the person who specked the hardware was for every instance to have the performance of its own 5400 RPM HDD.  Three fallacies there:  1) that anyone would consider that acceptable 2) that it would be sustainable during heavy usage or backfill/recovery and especially 3) that 450 / 3 = 2000.  It was just miserable.  I suspect that your use-case is different.  If spinners work for your purposes and you don’t need IOPs or the ability to provision SSDs down the road, more power to you.




* Which tickled a certain HDD mfg’s design flaws in a manner that substantially risked data availability and durability, in turn directly costing the organization substantial user dissatisfaction and hundreds of thousands of dollars.

> 
> These kinds of statements make me at least ask questions. Dozens of 14TB HDDs have worked reasonably well for us for four years of RBD for cloud, and hundreds of 16TB HDDs have satisfied our requirements for two years of RGW operations, such that we are deploying 22TB HDDs in the next batch. It remains to be seen how well 60 disk SAS-attached JBOD chassis work, but we believe we have an effective use case.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux