On Fri, 25 Jul 2014 13:31:34 +1000 Matt Harlum wrote: > Hi, > > I?ve purchased a couple of 45Drives enclosures and would like to figure > out the best way to configure these for ceph? > That's the second time within a month somebody mentions these 45 drive chassis. Would you mind elaborating which enclosures these are precisely? I'm wondering especially about the backplane, as 45 is such an odd number. Also if you don't mind, specify "a couple" and what your net storage requirements are. In fact, read this before continuing: --- https://www.mail-archive.com/ceph-users at lists.ceph.com/msg11011.html --- > Mainly I was wondering if it was better to set up multiple raid groups > and then put an OSD on each rather than an OSD for each of the 45 drives > in the chassis? > Steve already towed the conservative Ceph party line here, let me give you some alternative views and options on top of that and to recap what I wrote in the thread above. In addition to his links, read this: --- https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf --- Lets go from cheap and cheerful to "comes with racing stripes". 1) All spinning rust, all the time. Plunk in 45 drives, as JBOD behind the cheapest (and densest) controllers you can get. Having the journal on the disks will halve their performance, but you just wanted the space and are not that pressed for IOPS. The best you can expect per node with this setup is something around 2300 IOPS with normal (7200RPM) disks. 2) Same as 1), but use controllers with a large HW cache (4GB Areca comes to mind) in JBOD (or 45 times RAID0) mode. This will alleviate some of the thrashing problems, particular if you're expecting high IOPS to be in short bursts. 3) Ceph Classic, basically what Steve wrote. 32HDDs, 8SSDs for journals (you do NOT want an uneven spread of journals). This will give you sustainable 3200 IOPS, but of course the journals on SSDs not only avoid all that trashing about on the disk but also allow for coalescing of writes, so this is going to be fastest solution so far. Of course you will need 3 of these at minimum for acceptable redundancy, unlike 4) which just needs a replication level of 2. 4) The anti-cephalopod. See my reply from a month ago in the link above. All the arguments apply, it very much depends upon your use case and budget. In my case the higher density, lower cost and ease of maintaining the cluster where well worth the lower IOPS. 5) We can improve upon 3) by using HW cached controllers of course. And hey, you did need to connect those drive bays somehow anyway. ^o^ Maybe even squeeze some more out of it by having the SSD controller separate from the HDD one(s). This is as fast (IOPS) as it comes w/o going to full SSD. Networking: Either of the setups above will saturate a single 10Gb/s aka 1GB/s as Steve noted. In fact 3) to 5) will be able to write up to 4GB/s in theory based on the HDDs sequential performance, but that is unlikely to be seen in real live. And of course your maximum write speed is based on the speed of the SSDs. So for example with 3) you would want those 8 SSDs to have write speeds of about 250MB/s, giving you 2GB/s max write. Which in turn means 2 10GB/s links at least, up to 4 if you want redundancy and/or a separation of public and cluster network. RAM: The more, the merrier. It's relatively cheap and avoiding have to actually read from the disks will make your write IOPS so much happier. CPU: You'll want something like Steve recommended for 3), I'd go with 2 8core CPUs actually, so you have some Oomps to spare for the OS, IRQ handling, etc. With 4) and actual 4 OSDs, about half of that will be fine, with the expectation of Ceph code improvements. Mobo: You're fine for overall PCIe bandwidth, even w/o going to PCIe v3. But you might have up to 3 HBAs/RAID cards and 2 network cards, so make sure you and get this all into appropriate slots. Regards, Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/