Hello, On Mon, 23 Apr 2018 17:43:03 +0200 Florian Florensa wrote: > Hello everyone, > > I am in the process of designing a Ceph cluster, that will contain > only SSD OSDs, and I was wondering how should I size my cpu. Several threads about this around here, but first things first. Any specifics about the storage needs, i.e. do you think you need the SSDs for bandwidth or for IOPS reasons primarily? Lots of smallish writes or large reads/writes? > The cluster will only be used for block storage. > The OSDs will be Samsung PM863 (2Tb or 4Tb, this will be determined I assume PM863a, the non "a" model seems to be gone. And that's a 1.3 DWPD drive, with a collocated journal or lots of small writes and a collocated WAL/DB it will be half of that. So run the numbers and make sure this is actually a good fit in the endurance area. Of course depending on your needs, journals or WAL/DB on higher endurance NVMes might be a much better fit anyway. > when we will set the total volumetry in stone), and it will be in 2U > 24SSDs servers How many servers are you thinking about? Because the fact that you're willing to double the SSD size but not the number of servers suggests that you're thinking about a small number of servers. And while dense servers will save you space and money, more and smaller servers are generally a better fit for Ceph, not least when considering failure domains (a host typically). > Those server will probably be either Supermicro 2029U-E1CR4T or > Supermicro 2028R-E1CR48L. > I’ve read quite a lot of documentation regarding hardware choices, and > I can’t find a ‘guideline’ for OSDs on SSD with colocated journal. If this is a new cluster, that would be collocated WAL/DB and Bluestore. Never mind my misgivings about Bluestore, at this point in time you probably don't want to deploy a new cluster with filestore, unless you have very specific needs and know what you're doing. > I was pointing for either dual ‘Xeon gold 6146’ or dual ‘Xeon 2699v4’ > for the cpus, depending on the chassis. The first one is a much better fit in terms of the "a fast core for each OSD" philosophy needed for low latency and high IOPS. The 2nd is just overkill, 24 real cores will do and for extreme cases I'm sure I can still whip a fio setting that will saturate the 44 real cores of the 2nd setup. Of course dual CPU configurations like this come with a potential latency penalty for NUMA misses. Unfortunately Supermicro didn't release my suggested Epyc based Ceph storage node (yet?). I was mentioning a single socket 1U (or 2U double) with 10 2.5 bays, with up to 2 NVMe in those bays. But even dual CPU Epyc based systems have a clear speed advantage when it comes to NUMA misses due to the socket interconnect (Infinity Fabric). Do consider this alternative setup: https://www.supermicro.com.tw/Aplus/system/1U/1123/AS-1123US-TR4.cfm With either 8 SSDs and 2 NVMes or 10 SSDs and either 2x Epyc 7251 (adequate core ratio and speed, cheap) or 2x Epyc 7351 (massive overkill, but still 1/4 of the Intel price tag). The unreleased AS-2123US-TN24R25 with 2x Epyc 7351 might be a good fit as well. > For the network part, I was thinking of using two Dual port connectx4 > Lx from mellanox per servers. > Going to what kind of network/switches? > If anyone has some ideas/thoughts/pointers, I would be glad to hear them. > RAM, you'll need a lot of it, even more with Bluestore given the current caching. I'd say 1GB per TB storage as usual and 1-2GB extra per OSD. > Regards, > > Florian > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com