On 07/25/2014 12:04 PM, Christian Balzer wrote: > On Fri, 25 Jul 2014 07:24:26 -0500 Mark Nelson wrote: > >> On 07/25/2014 02:54 AM, Christian Balzer wrote: >>> On Fri, 25 Jul 2014 13:31:34 +1000 Matt Harlum wrote: >>> >>>> Hi, >>>> >>>> I?ve purchased a couple of 45Drives enclosures and would like to >>>> figure out the best way to configure these for ceph? >>>> >>> That's the second time within a month somebody mentions these 45 drive >>> chassis. >>> Would you mind elaborating which enclosures these are precisely? >> >> I'm guessing the supermicro SC847E26: >> >> http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm >> > Le Ouch! > > They really must be getting desperate for high density chassis that are > not top loading at Supermicro. > > Well, if I read that link and the actual manual correctly, the most one > can hope to get from this is 48Gb/s (2 mini-SAS with 4 lanes each) which is > short of what 45 regular HDDs can dish out (or take in). > And that's ignoring the the inherent deficiencies when dealing with port > expanders. > > Either way, a head for this kind of enclosure would need pretty much all > the things mentioned before, a low density (8 lanes), but high performance > and large cache controller and definitely SSDs for journals. > > There must be some actual threshold, but my gut feeling tells me that > something slightly less dense where you don't have to get another case for > the head might turn out cheaper. > Especially if a 1U head (RAID/HBA and network cards) and space for > journal SSDs doesn't cut it. Personally I'm a much bigger fan of the SC847A. No expanders in the backplane, 36 3.5" bays with the MB integrated. It's a bit old at this point and the fattwin nodes can go denser (both in terms of nodes and drives), but I've been pretty happy with it as a performance test platform. It's really nice having the drives directly connected to the controllers. having 4-5 controllers in 1 box is a bit tricky though. The fattwin hadoop nodes are a bit nicer in that regard. Mark > > Christian > >>> >>> I'm wondering especially about the backplane, as 45 is such an odd >>> number. >>> >>> Also if you don't mind, specify "a couple" and what your net storage >>> requirements are. >>> >>> In fact, read this before continuing: >>> --- >>> https://www.mail-archive.com/ceph-users at lists.ceph.com/msg11011.html >>> --- >>> >>>> Mainly I was wondering if it was better to set up multiple raid groups >>>> and then put an OSD on each rather than an OSD for each of the 45 >>>> drives in the chassis? >>>> >>> Steve already towed the conservative Ceph party line here, let me give >>> you some alternative views and options on top of that and to recap >>> what I wrote in the thread above. >>> >>> In addition to his links, read this: >>> --- >>> https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf >>> --- >>> >>> Lets go from cheap and cheerful to "comes with racing stripes". >>> >>> 1) All spinning rust, all the time. Plunk in 45 drives, as JBOD behind >>> the cheapest (and densest) controllers you can get. Having the journal >>> on the disks will halve their performance, but you just wanted the >>> space and are not that pressed for IOPS. >>> The best you can expect per node with this setup is something around >>> 2300 IOPS with normal (7200RPM) disks. >>> >>> 2) Same as 1), but use controllers with a large HW cache (4GB Areca >>> comes to mind) in JBOD (or 45 times RAID0) mode. >>> This will alleviate some of the thrashing problems, particular if >>> you're expecting high IOPS to be in short bursts. >>> >>> 3) Ceph Classic, basically what Steve wrote. >>> 32HDDs, 8SSDs for journals (you do NOT want an uneven spread of >>> journals). This will give you sustainable 3200 IOPS, but of course the >>> journals on SSDs not only avoid all that trashing about on the disk >>> but also allow for coalescing of writes, so this is going to be >>> fastest solution so far. Of course you will need 3 of these at minimum >>> for acceptable redundancy, unlike 4) which just needs a replication >>> level of 2. >>> >>> 4) The anti-cephalopod. See my reply from a month ago in the link >>> above. All the arguments apply, it very much depends upon your use >>> case and budget. In my case the higher density, lower cost and ease of >>> maintaining the cluster where well worth the lower IOPS. >>> >>> 5) We can improve upon 3) by using HW cached controllers of course. And >>> hey, you did need to connect those drive bays somehow anyway. ^o^ >>> Maybe even squeeze some more out of it by having the SSD controller >>> separate from the HDD one(s). >>> This is as fast (IOPS) as it comes w/o going to full SSD. >>> >>> >>> Networking: >>> Either of the setups above will saturate a single 10Gb/s aka 1GB/s as >>> Steve noted. >>> In fact 3) to 5) will be able to write up to 4GB/s in theory based on >>> the HDDs sequential performance, but that is unlikely to be seen in >>> real live. And of course your maximum write speed is based on the >>> speed of the SSDs. So for example with 3) you would want those 8 SSDs >>> to have write speeds of about 250MB/s, giving you 2GB/s max write. >>> Which in turn means 2 10GB/s links at least, up to 4 if you want >>> redundancy and/or a separation of public and cluster network. >>> >>> RAM: >>> The more, the merrier. >>> It's relatively cheap and avoiding have to actually read from the disks >>> will make your write IOPS so much happier. >>> >>> CPU: >>> You'll want something like Steve recommended for 3), I'd go with 2 >>> 8core CPUs actually, so you have some Oomps to spare for the OS, IRQ >>> handling, etc. With 4) and actual 4 OSDs, about half of that will be >>> fine, with the expectation of Ceph code improvements. >>> >>> Mobo: >>> You're fine for overall PCIe bandwidth, even w/o going to PCIe v3. >>> But you might have up to 3 HBAs/RAID cards and 2 network cards, so make >>> sure you and get this all into appropriate slots. >>> >>> Regards, >>> >>> Christian >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >