Re: Cores/Memory/GHz recommendation for SSD based OSD servers

Nick Fisk <nick@xxxxxxxxxx> · Thu, 2 Apr 2015 20:04:22 +0100

> 
> 
> On Thursday, April 2, 2015, Nick Fisk <nick@xxxxxxxxxx> wrote:
> I'm probably going to get shot down for saying this...but here goes.
> 
> As a very rough guide, think of it more as you need around 10Mhz for every
> IO, whether that IO is 4k or 4MB it uses roughly the same amount of CPU, as
> most of the CPU usage is around ceph data placement rather than the actual
> read/writes to disk.
> 
> That piece of information is, by far, one of the most helpful things I've ever
> read on this list regarding hardware configuration. Thanks for sharing that!

That was just a finger in the air figure taking the recommended 1Ghz per OSD and a HDD doing 100 iops, so please don't take it as gospel. If you want to get some accurate figures I would suggest building a 1 node 1 SSD OSD cluster and playing with the CPU scaling. You should be able to plot the results on a graph and generate quite an accurate figure then in theory.

> 
> That calculation came close to my cluster's max iops.  I've seen just over 11k
> iops(under ideal conditions with short bursts of io) the 10Mhz calculation says
> 12k iops.
> 
> For the record, my cluster is 6 osd nodes, each node has:
> 2x 4 core, 2.5GHz CPUs
> 32GB RAM
> 7x 3.5" 7.2k rpms 2TB disks (one for each osd)
> RAID card with 1GB write-back cache w/ BBU
> 2x 40Gb NIC
> No ssd journals
> 
> What effect does replication have on the 10Mhz/iop number, in your
> experience?  My 11k iops was achieved with 2x replication.  I've seen over
> 10k iops with 3x replication. Typically, I can get 2k - 3k iops with long
> sequential io patterns.

There's multiple answers to this really. The 1st is that adding replicas increases the serial latency for each write request, as all copies need to acknowledge, so faster Ghz per core can reduce this impact. 2nd is that each additional copy requires CPU to place the data, I'm not sure how this scales but I would imagine its reasonably linear, so maybe ~10Mhz per copy for writes. For reads there should be no real overhead.

But when you are taking Replica's into account, you are talking about total Ghz across the whole cluster and not per node or even OSD.

> 
> I'm getting my budget ready for next quarter, so I've been trying to decide
> how to spend money to best improve Ceph performance.
> 
> To improve long sequential write io, I've been debating adding a PCI flash
> accelerator card to each osd node vs just adding another 6 osd nodes. The
> cost is about the same.
> 

It’s a tough call, it really depends on what sort of performance you need. If you need lots of IO at high queue depths (Think lots of VM's) you would probably be better off scaling horizontally. By that I mean more OSD nodes with cheap slowish cores (2-2.5ghz) and more S3700 style SSD's.

If you need the absolute best performance for single threaded workloads (think OLTP DB's), you want the fastest SSD's you can get, with the fastest cores you can get to reduce per IO latency. But at this point I would also start to consider SSD caching on RBD's or not even using Ceph at all.

> 
> I can nearly saturate 12x2.1ghz cores with a single SSD, doing 4k ios at high
> queue depths.
> 
> Which brings us back to your original question, rather than asking how much
> CPU for x amount of SSD's. How many IOs do you require out your cluster?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com