On 07/25/2014 02:54 AM, Christian Balzer wrote: > On Fri, 25 Jul 2014 13:31:34 +1000 Matt Harlum wrote: > >> Hi, >> >> I?ve purchased a couple of 45Drives enclosures and would like to figure >> out the best way to configure these for ceph? >> > That's the second time within a month somebody mentions these 45 drive > chassis. > Would you mind elaborating which enclosures these are precisely? I'm guessing the supermicro SC847E26: http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm > > I'm wondering especially about the backplane, as 45 is such an odd number. > > Also if you don't mind, specify "a couple" and what your net storage > requirements are. > > In fact, read this before continuing: > --- > https://www.mail-archive.com/ceph-users at lists.ceph.com/msg11011.html > --- > >> Mainly I was wondering if it was better to set up multiple raid groups >> and then put an OSD on each rather than an OSD for each of the 45 drives >> in the chassis? >> > Steve already towed the conservative Ceph party line here, let me give you > some alternative views and options on top of that and to recap what I > wrote in the thread above. > > In addition to his links, read this: > --- > https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf > --- > > Lets go from cheap and cheerful to "comes with racing stripes". > > 1) All spinning rust, all the time. Plunk in 45 drives, as JBOD behind the > cheapest (and densest) controllers you can get. Having the journal on the > disks will halve their performance, but you just wanted the space and are > not that pressed for IOPS. > The best you can expect per node with this setup is something around 2300 > IOPS with normal (7200RPM) disks. > > 2) Same as 1), but use controllers with a large HW cache (4GB Areca comes > to mind) in JBOD (or 45 times RAID0) mode. > This will alleviate some of the thrashing problems, particular if you're > expecting high IOPS to be in short bursts. > > 3) Ceph Classic, basically what Steve wrote. > 32HDDs, 8SSDs for journals (you do NOT want an uneven spread of journals). > This will give you sustainable 3200 IOPS, but of course the journals on > SSDs not only avoid all that trashing about on the disk but also allow for > coalescing of writes, so this is going to be fastest solution so far. > Of course you will need 3 of these at minimum for acceptable redundancy, > unlike 4) which just needs a replication level of 2. > > 4) The anti-cephalopod. See my reply from a month ago in the link above. > All the arguments apply, it very much depends upon your use case and > budget. In my case the higher density, lower cost and ease of maintaining > the cluster where well worth the lower IOPS. > > 5) We can improve upon 3) by using HW cached controllers of course. And > hey, you did need to connect those drive bays somehow anyway. ^o^ > Maybe even squeeze some more out of it by having the SSD controller > separate from the HDD one(s). > This is as fast (IOPS) as it comes w/o going to full SSD. > > > Networking: > Either of the setups above will saturate a single 10Gb/s aka 1GB/s as > Steve noted. > In fact 3) to 5) will be able to write up to 4GB/s in theory based on the > HDDs sequential performance, but that is unlikely to be seen in real live. > And of course your maximum write speed is based on the speed of the SSDs. > So for example with 3) you would want those 8 SSDs to have write speeds of > about 250MB/s, giving you 2GB/s max write. > Which in turn means 2 10GB/s links at least, up to 4 if you want > redundancy and/or a separation of public and cluster network. > > RAM: > The more, the merrier. > It's relatively cheap and avoiding have to actually read from the disks > will make your write IOPS so much happier. > > CPU: > You'll want something like Steve recommended for 3), I'd go with 2 8core > CPUs actually, so you have some Oomps to spare for the OS, IRQ handling, > etc. With 4) and actual 4 OSDs, about half of that will be fine, with the > expectation of Ceph code improvements. > > Mobo: > You're fine for overall PCIe bandwidth, even w/o going to PCIe v3. > But you might have up to 3 HBAs/RAID cards and 2 network cards, so make > sure you and get this all into appropriate slots. > > Regards, > > Christian >