Re: 6 Node cluster with 24 SSD per node: Hardware planning / agreement

Denny Fuchs <linuxmail@xxxxxxxx> · Tue, 04 Oct 2016 16:51:14 +0200

Hi,

thanks for take a look :-)

Am 04.10.2016 16:11, schrieb Nick Fisk:

We have two goals:

* High availability
* Short latency for our transaction services

How Low? See below re CPU's

so low, what is possible without doing crazy stuff. We thinking to put 
the database on CEPH too, instead of local SSDs on separated servers.

via API so a separated meta server isn't needed, as I understand all 
the documents right.

Meta data server is for CephFS (Distributed Filesystem) for direct
librados library calls or RBD (Block Devices) you need only mon's
and osd's.

perfect :-)

All nodes are connected over cross to every switch, so if one switch 
goes down, a second path is available.

Isn't that a 1GB switch with a couple of 10G modules? Any reason you
can't get a pure 10G switch?

right. The reason is, we have them already with stack modules and dual 
power supply. So we need only the 10Gbit backplane modules. That's it.

* Disk:
** Storage: 24 x Crucial MX300 250GB (maybe for production 12xSSD / 
12x
big Sata disks)

I would be very careful about using these. They are not enterprise
SSD's. I would go for either S3610 or S3510 if you will be doing
mainly reads.

there was a long long discussion (also here on the list ...) I would 
also prefer enterprise SSDs ... but they are too expensive ... maybe for 
storage we could use the Samsung 850pro series or what fits in the same 
price region. I would personally use SSDs with power loss protection so 
the Intel S3510 / S37xx fits also and is on a second buy list.

** OSD journal: 1 x Intel SSD DC P3700 PCIe

That will not be enough to journal 24x SSD's. Or is this just for the
SATA disks and SSD's have no journals? which in case it will
be fine.

hmm, Ok, I was nearly sure to hit this question ..... yes it would be 
for all journals. If you would say, we don't need it if we put the OSD 
journals on each OSD ...

We would use the 400GB DC P3700 PCIe edition for journals.
Otherwise I would read between the lines, we need two of them to carry 
all the journals from all SSD drives.

really needed in our case. Sure, the cache is one of the benefits, but
maybe it is more complicated, than a plain HBA.

Yeah, RAID controllers can sometimes increase performance slightly due
to write back cache, but they can also get overwhelmed and
end up being slower. Especially with SSD's you are probably best with 
plain HBA.

great to hear :-)

The OS would be Proxmox 4.x (based on Debian Jessie) with Hammer or
Jewel, but WITHOUT ANY VMs on it. We want to keep the systems are in 
one
hand :-)

Why are you going to run Proxmox with no VM's just for Ceph? What's
wrong with just Ubuntu or Debian?

Proxmox would be become our main hypervisor and Ceph is builtin 
technology with all kind of stuff, which is needed. So in the end we 
have 6 OSD nodes and 4 hypervisor, all under the "umbrella" from 
Proxmox.
So the documentation and maintenance is much easier as it is based on 
one plattform.

So we want to know, the hardware should be O.K also with running the 
mon
servers on the same HW, like the OSDs. We know, that every OSD should
own a core, so the 2620v4 has 8 cores, with HT 16 and in sum we have 
32
CPUs per OSD node, which should be fine, .... I think ....

I would play less attention to the number of cores + osd's, instead
look at the total number of Ghz and number of IOPs you require.
I have been doing some testing recently and have come up a figure of
around 1Mhz per IO. I will be writing up a blog article with
more details in the near future.

If you need low number of IO's but with low latency, I would go with
lower number of cores with very fast cores (3.5Ghz+). Otherwise
if you think you will be generating 100's thousands of IO's then you
probably want more cores and will have to take the increased
latency due to slower cores as a compromise.

that is extremely nice to know !! Most of documentation is based on the 
core, but not the plain Mhz.

thank you for the comments :-)

cu denny

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com