Optimal OSD Configuration for 45 drives?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 25 Jul 2014 13:31:34 +1000 Matt Harlum wrote:

> Hi,
> 
> I?ve purchased a couple of 45Drives enclosures and would like to figure
> out the best way to configure these for ceph?
> 
That's the second time within a month somebody mentions these 45 drive
chassis. 
Would you mind elaborating which enclosures these are precisely?

I'm wondering especially about the backplane, as 45 is such an odd number.

Also if you don't mind, specify "a couple" and what your net storage
requirements are.

In fact, read this before continuing:
---
https://www.mail-archive.com/ceph-users at lists.ceph.com/msg11011.html
---

> Mainly I was wondering if it was better to set up multiple raid groups
> and then put an OSD on each rather than an OSD for each of the 45 drives
> in the chassis? 
> 
Steve already towed the conservative Ceph party line here, let me give you
some alternative views and options on top of that and to recap what I
wrote in the thread above.

In addition to his links, read this:
---
https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf
---

Lets go from cheap and cheerful to "comes with racing stripes".

1) All spinning rust, all the time. Plunk in 45 drives, as JBOD behind the
cheapest (and densest) controllers you can get. Having the journal on the
disks will halve their performance, but you just wanted the space and are
not that pressed for IOPS. 
The best you can expect per node with this setup is something around 2300
IOPS with normal (7200RPM) disks.

2) Same as 1), but use controllers with a large HW cache (4GB Areca comes
to mind) in JBOD (or 45 times RAID0) mode. 
This will alleviate some of the thrashing problems, particular if you're
expecting high IOPS to be in short bursts.

3) Ceph Classic, basically what Steve wrote. 
32HDDs, 8SSDs for journals (you do NOT want an uneven spread of journals). 
This will give you sustainable 3200 IOPS, but of course the journals on
SSDs not only avoid all that trashing about on the disk but also allow for
coalescing of writes, so this is going to be fastest solution so far.
Of course you will need 3 of these at minimum for acceptable redundancy,
unlike 4) which just needs a replication level of 2.

4) The anti-cephalopod. See my reply from a month ago in the link above.
All the arguments apply, it very much depends upon your use case and
budget. In my case the higher density, lower cost and ease of maintaining
the cluster where well worth the lower IOPS.

5) We can improve upon 3) by using HW cached controllers of course. And
hey, you did need to connect those drive bays somehow anyway. ^o^ 
Maybe even squeeze some more out of it by having the SSD controller
separate from the HDD one(s).
This is as fast (IOPS) as it comes w/o going to full SSD.


Networking:
Either of the setups above will saturate a single 10Gb/s aka 1GB/s as
Steve noted. 
In fact 3) to 5) will be able to write up to 4GB/s in theory based on the
HDDs sequential performance, but that is unlikely to be seen in real live.
And of course your maximum write speed is  based on the speed of the SSDs.
So for example with 3) you would want those 8 SSDs to have write speeds of
about 250MB/s, giving you 2GB/s max write.
Which in turn means 2 10GB/s links at least, up to 4 if you want
redundancy and/or a separation of public and cluster network.

RAM:
The more, the merrier. 
It's relatively cheap and avoiding have to actually read from the disks
will make your write IOPS so much happier.

CPU:
You'll want something like Steve recommended for 3), I'd go with 2 8core
CPUs actually, so you have some Oomps to spare for the OS, IRQ handling,
etc. With 4) and actual 4 OSDs, about half of that will be fine, with the
expectation of Ceph code improvements. 

Mobo:
You're fine for overall PCIe bandwidth, even w/o going to PCIe v3. 
But you might have up to 3 HBAs/RAID cards and 2 network cards, so make
sure you and get this all into appropriate slots.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux