Re: 6 Node cluster with 24 SSD per node: Hardware planning / agreement

Nick Fisk <nick@xxxxxxxxxx> · Tue, 4 Oct 2016 15:11:21 +0100

Hi, Comments inline

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Denny Fuchs
> Sent: 04 October 2016 14:43
> To: ceph-users@xxxxxxxxxxxxxx
> Subject:  6 Node cluster with 24 SSD per node: Hardware planning / agreement
> 
> Hello,
> 
> we are brand new to Ceph and planing it as our future storage for KVM/LXC VMs as replacement for Xen / DRBD / Pacemaker /
> Synology (NFS) stuff.
> 
> 
> We have two goals:
> 
> * High availability
> * Short latency for our transaction services

How Low? See below re CPU's

> * For later: replication to different datacenter connected via 10Gb/s FC
> 
> 
> Our services are:
> 
> * Webapplication as frontent
> * Database (Sybase / MariaDB Galera) as backend
> 
> All needed for doing transactions
> 
> 
> All we are planing is at this time more than we need, but for future development and replacement for our old hardware stuff and
> software, we want the best, we can get for our (approved) money :-)
> 
> So, here we are:
> 
> Starting with a six OSD node cluster, that are doing not only OSD stuff, but also holding the mon services. We want to store data
only
> via API so a separated meta server isn't needed, as I understand all the documents right.

Meta data server is for CephFS (Distributed Filesystem) for direct librados library calls or RBD (Block Devices) you need only mon's
and osd's.

> 
> 
> The first test hardware is:
> 
> *Motherboard: Asus Z10Pr-D16
> **
> https://www.asus.com/de/Commercial-Servers-Workstations/Z10PRD16/specifications/
> 
> * CPU: 2 x E5-2620v4
> * Ram: 4 x 32GB DDR4 2400MHz
> 
> * Chassis: RSC-2AT0-80PG-SA3C-0BL-A
> ** http://www.aicipc.com/ProductSKU.aspx?ref=RSC-2AT
> ** Edition without Expander
> 
> * SAS: 1 x 9305-24i
> **
> http://www.avagotech.com/products/server-storage/host-bus-adapters/sas-9305-24i#specifications
> 
> * Storage NIC: 1 x Infiniband MCX314A-BCCT
> ** I red, that ConnectX-3 Pro is better supported, than the X-4 and a bit cheaper
> ** Switch: 2 x Mellanox SX6012 (56Gb/s)
> ** Active FC cables
> ** Maybe VPI is nice to have, but unsure.
> 
> * Production NIC: 1 x Intel 520 dual port SFP+
> ** Connected each to one of a HP 2920 10Gb/s ports via 802.3ad
> 
> All nodes are connected over cross to every switch, so if one switch goes down, a second path is available.

Isn't that a 1GB switch with a couple of 10G modules? Any reason you can't get a pure 10G switch?

> 
> 
> * Disk:
> ** Storage: 24 x Crucial MX300 250GB (maybe for production 12xSSD / 12x
> big Sata disks)

I would be very careful about using these. They are not enterprise SSD's. I would go for either S3610 or S3510 if you will be doing
mainly reads.

> ** OSD journal: 1 x Intel SSD DC P3700 PCIe

That will not be enough to journal 24x SSD's. Or is this just for the SATA disks and SSD's have no journals? which in case it will
be fine.

> 
> 
> One of the hardest part was the chassis with or without active expander,
> so that we can use a "cheaper" HBA, like the 8i or something else.
> Also if we want/need a full raid controller like the Megaraid
> sas-9361-8i, because of battery and cache. But it seems, that it isn't
> really needed in our case. Sure, the cache is one of the benefits, but
> maybe it is more complicated, than a plain HBA.

Yeah, RAID controllers can sometimes increase performance slightly due to write back cache, but they can also get overwhelmed and
end up being slower. Especially with SSD's you are probably best with plain HBA.

> 
> 
>  From the Ceph point of view, we want, that two OSD nodes can go down in
> a worst case scenario, but keeping our business up (a bit slower is OK,
> and expected). Also if the nodes comes back, we are not down, because of
> the replication stuff ;-)
> 
> 
> The OS would be Proxmox 4.x (based on Debian Jessie) with Hammer or
> Jewel, but WITHOUT ANY VMs on it. We want to keep the systems are in one
> hand :-)

Why are you going to run Proxmox with no VM's just for Ceph? What's wrong with just Ubuntu or Debian?

> 
> 
> So we want to know, the hardware should be O.K also with running the mon
> servers on the same HW, like the OSDs. We know, that every OSD should
> own a core, so the 2620v4 has 8 cores, with HT 16 and in sum we have 32
> CPUs per OSD node, which should be fine, .... I think ....

I would play less attention to the number of cores + osd's, instead look at the total number of Ghz and number of IOPs you require.
I have been doing some testing recently and have come up a figure of around 1Mhz per IO. I will be writing up a blog article with
more details in the near future.

If you need low number of IO's but with low latency, I would go with lower number of cores with very fast cores (3.5Ghz+). Otherwise
if you think you will be generating 100's thousands of IO's then you probably want more cores and will have to take the increased
latency due to slower cores as a compromise.

> 
> 
> It would be very helpful, if someone take a short view on our list, if
> there is component we shouldn't buy for the production side of life :-)
> 
> 
> cu denny
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com