Optimal OSD Configuration for 45 drives?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 25 Jul 2014 07:24:26 -0500 Mark Nelson wrote:

> On 07/25/2014 02:54 AM, Christian Balzer wrote:
> > On Fri, 25 Jul 2014 13:31:34 +1000 Matt Harlum wrote:
> >
> >> Hi,
> >>
> >> I?ve purchased a couple of 45Drives enclosures and would like to
> >> figure out the best way to configure these for ceph?
> >>
> > That's the second time within a month somebody mentions these 45 drive
> > chassis.
> > Would you mind elaborating which enclosures these are precisely?
> 
> I'm guessing the supermicro SC847E26:
> 
> http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm
> 
Le Ouch!

They really must be getting  desperate for high density chassis that are
not top loading at Supermicro. 

Well, if I read that link and the actual manual correctly, the most one
can hope to get from this is 48Gb/s (2 mini-SAS with 4 lanes each) which is
short of what 45 regular HDDs can dish out (or take in). 
And that's ignoring the the inherent deficiencies when dealing with port
expanders.

Either way, a head for this kind of enclosure would need pretty much all
the things mentioned before, a low density (8 lanes), but high performance
and large cache controller and definitely SSDs for journals.

There must be some actual threshold, but my gut feeling tells me that
something slightly less dense where you don't have to get another case for
the head might turn out cheaper. 
Especially if a 1U head (RAID/HBA and network cards) and space for
journal SSDs doesn't cut it.

Christian

> >
> > I'm wondering especially about the backplane, as 45 is such an odd
> > number.
> >
> > Also if you don't mind, specify "a couple" and what your net storage
> > requirements are.
> >
> > In fact, read this before continuing:
> > ---
> > https://www.mail-archive.com/ceph-users at lists.ceph.com/msg11011.html
> > ---
> >
> >> Mainly I was wondering if it was better to set up multiple raid groups
> >> and then put an OSD on each rather than an OSD for each of the 45
> >> drives in the chassis?
> >>
> > Steve already towed the conservative Ceph party line here, let me give
> > you some alternative views and options on top of that and to recap
> > what I wrote in the thread above.
> >
> > In addition to his links, read this:
> > ---
> > https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf
> > ---
> >
> > Lets go from cheap and cheerful to "comes with racing stripes".
> >
> > 1) All spinning rust, all the time. Plunk in 45 drives, as JBOD behind
> > the cheapest (and densest) controllers you can get. Having the journal
> > on the disks will halve their performance, but you just wanted the
> > space and are not that pressed for IOPS.
> > The best you can expect per node with this setup is something around
> > 2300 IOPS with normal (7200RPM) disks.
> >
> > 2) Same as 1), but use controllers with a large HW cache (4GB Areca
> > comes to mind) in JBOD (or 45 times RAID0) mode.
> > This will alleviate some of the thrashing problems, particular if
> > you're expecting high IOPS to be in short bursts.
> >
> > 3) Ceph Classic, basically what Steve wrote.
> > 32HDDs, 8SSDs for journals (you do NOT want an uneven spread of
> > journals). This will give you sustainable 3200 IOPS, but of course the
> > journals on SSDs not only avoid all that trashing about on the disk
> > but also allow for coalescing of writes, so this is going to be
> > fastest solution so far. Of course you will need 3 of these at minimum
> > for acceptable redundancy, unlike 4) which just needs a replication
> > level of 2.
> >
> > 4) The anti-cephalopod. See my reply from a month ago in the link
> > above. All the arguments apply, it very much depends upon your use
> > case and budget. In my case the higher density, lower cost and ease of
> > maintaining the cluster where well worth the lower IOPS.
> >
> > 5) We can improve upon 3) by using HW cached controllers of course. And
> > hey, you did need to connect those drive bays somehow anyway. ^o^
> > Maybe even squeeze some more out of it by having the SSD controller
> > separate from the HDD one(s).
> > This is as fast (IOPS) as it comes w/o going to full SSD.
> >
> >
> > Networking:
> > Either of the setups above will saturate a single 10Gb/s aka 1GB/s as
> > Steve noted.
> > In fact 3) to 5) will be able to write up to 4GB/s in theory based on
> > the HDDs sequential performance, but that is unlikely to be seen in
> > real live. And of course your maximum write speed is  based on the
> > speed of the SSDs. So for example with 3) you would want those 8 SSDs
> > to have write speeds of about 250MB/s, giving you 2GB/s max write.
> > Which in turn means 2 10GB/s links at least, up to 4 if you want
> > redundancy and/or a separation of public and cluster network.
> >
> > RAM:
> > The more, the merrier.
> > It's relatively cheap and avoiding have to actually read from the disks
> > will make your write IOPS so much happier.
> >
> > CPU:
> > You'll want something like Steve recommended for 3), I'd go with 2
> > 8core CPUs actually, so you have some Oomps to spare for the OS, IRQ
> > handling, etc. With 4) and actual 4 OSDs, about half of that will be
> > fine, with the expectation of Ceph code improvements.
> >
> > Mobo:
> > You're fine for overall PCIe bandwidth, even w/o going to PCIe v3.
> > But you might have up to 3 HBAs/RAID cards and 2 network cards, so make
> > sure you and get this all into appropriate slots.
> >
> > Regards,
> >
> > Christian
> >
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux