Re: high density machines

Nick Fisk <nick@xxxxxxxxxx> · Fri, 4 Sep 2015 09:32:23 +0100

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Gurvinder Singh
> Sent: 04 September 2015 08:57
> To: Wang, Warren <Warren_Wang@xxxxxxxxxxxxxxxxx>; Mark Nelson
> <mnelson@xxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  high density machines
> 
> On 09/04/2015 02:31 AM, Wang, Warren wrote:
> > In the minority on this one. We have a number of the big SM 72 drive
units
> w/ 40 Gbe. Definitely not as fast as even the 36 drive units, but it isn't
awful
> for our average mixed workload. We can exceed all available performance
> with some workloads though.
> >
> > So while we can't extract all the performance out of the box, as long
> > as we don't max out on performance, the cost is very appealing,
> I am wondering how much the cost difference you have seen with SM 72
> drive compare to lets say
> http://www.supermicro.com/products/system/1U/6017/SYS-6017R-
> 73THDP_.cfm
> or any other smaller machine which you have compared with. As with the
> discussion on this thread it is clear that the 72 drive box are actually
> 4 * 18 drive boxes sharing power and cooling. Regarding performance I
think
> network might be bottleneck (may be cpu too), as it is 40 Gbit for whole
box
> so you get 10 Gbit per box (18 drives each) which can be peaked.
> 
> Gurvinder

I think the 72 disk box is one unit, it's the fat twins have 12x3.5" +
2x2.5" per sled, or 56 drives per 4U.

That other server you linked is pretty similar to the fat twins sleds but
the only disadvantage I can see is that it is single PSU and 1 less 2.5"
drive. Unless you can spread your crush map over sufficient different number
of PDU's/feeds, I would be wary about running single PSU nodes as you could
have a 50% cluster  failure quite easily.

>  and as far as filling a unit, I'm not sure how many folks have filled big
prod
> clusters, but you really don't want them even running into the
> 70+% range due to some inevitable uneven filling, and room for failure.
> >
> > Also, I'm betting that Ceph will continue to optimize things like the
> messenger, and reduce some of the massive CPU and TCP overhead, so we
> can claw back performance. I would love to see a thread count reduction.
> These can see over 130K threads per box.
> >
> > Warren
> >
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> > Of Mark Nelson
> > Sent: Thursday, September 03, 2015 3:58 PM
> > To: Gurvinder Singh <gurvindersinghdahiya@xxxxxxxxx>;
> > ceph-users@xxxxxxxxxxxxxx
> > Subject: Re:  high density machines
> >
> >
> >
> > On 09/03/2015 02:49 PM, Gurvinder Singh wrote:
> >> Thanks everybody for the feedback.
> >> On 09/03/2015 05:09 PM, Mark Nelson wrote:
> >>> My take is that you really only want to do these kinds of systems if
> >>> you have massive deployments.  At least 10 of them, but probably
> >>> more like
> >>> 20-30+.  You do get massive density with them, but I think if you
> >>> 20-30+are
> >>> considering 5 of these, you'd be better off with 10 of the 36 drive
> >>> units.  An even better solution might be ~30-40 of these:
> >>>
> >>> http://www.supermicro.com/products/system/1U/6017/SYS-6017R-
> 73THDP_.
> >>> c
> >>> fm
> >>>
> >> This one does look interesting.
> >>> An extremely compelling solution would be if they took this system:
> >>>
> >>> http://www.supermicro.com/products/system/1U/5018/SSG-5018A-
> AR12L.cf
> >>> m
> >>> ?parts=SHOW
> >>>
> >> This one can be really good solution for archiving purpose with
> >> replaced CPU to get more juice into it.
> >>>
> >>> and replaced the C2750 with a Xeon-D 1540 (but keep the same number
> >>> of SATA ports).
> >>>
> >>> Potentially you could have:
> >>>
> >>> - 8x 2.0GHz Xeon Broadwell-DE Cores, 45W TDP
> >>> - Up to 128GB RAM (32GB probably the sweet spot)
> >>> - 2x 10GbE
> >>> - 12x 3.5" spinning disks
> >>> - single PCIe slot for PCIe SSD/NVMe
> >> I am wondering does single PCIe SSD/NVMe device can support 12 OSDs
> >> journals and still perform the same as 4 OSD per SSD ?
> >
> > Basically the limiting factor is how fast the device can do O_DSYNC
writes.
> We've seen that some PCIe SSD and NVME devices can do 1-2GB/s
> depending on the capacity which is enough to reasonably support 12-24
> OSDs.  Whether or not it's good to have a single PCIe card to be a point
of
> failure is a worthwhile topic (Probably only high write endurance cards
should
> be considered).  There are plenty of other things that can bring the node
> down too though (motherboard, ram, cpu, etc) though.  A single node
failure
> will also have less impact if there are lots of small nodes vs a couple
big ones.
> >
> >>>
> >>> The density would be higher than the 36 drive units but lower than
> >>> the
> >>> 72 drive units (though with shorter rack depth afaik).
> >> You mean the 1U solution with 12 disk is longer in length than 72
> >> disk 4U version ?
> >
> > Sorry, the other way around I believe.
> >
> >>
> >> - Gurvinder
> >>    Probably more
> >>> CPU per OSD and far better distribution of OSDs across servers.
> >>> Given that the 10GbE and processor are embedded on the
> motherboard,
> >>> there's a decent chance these systems could be priced reasonably and
> >>> wouldn't have excessive power/cooling requirements.
> >>>
> >>> Mark
> >>>
> >>> On 09/03/2015 09:13 AM, Jan Schermer wrote:
> >>>> It's not exactly a single system
> >>>>
> >>>> SSG-F618H-OSD288P*
> >>>> 4U-FatTwin, 4x 1U 72TB per node, Ceph-OSD-Storage Node
> >>>>
> >>>> This could actually be pretty good, it even has decent CPU power.
> >>>>
> >>>> I'm not a big fan of blades and blade-like systems - sooner or
> >>>> later a backplane will die and you'll need to power off everything,
> >>>> which is a huge PITA.
> >>>> But assuming you get 3 of these it could be pretty cool!
> >>>> It would be interesting to have a price comparison to a SC216
> >>>> chassis or similiar, I'm afraid it won't be much cheaper.
> >>>>
> >>>> Jan
> >>>>
> >>>>> On 03 Sep 2015, at 16:09, Kris Gillespie <kgillespie@xxxxxxx> wrote:
> >>>>>
> >>>>> It's funny cause in my mind, such dense servers seems like a bad
> >>>>> idea to me for exactly the reason you mention, what if it fails.
> >>>>> Losing 400+TB of storage is going to have quite some impact, 40G
> >>>>> interfaces or not and no matter what options you tweak.
> >>>>> Sure it'll be cost effective per TB, but that isn't the only
> >>>>> aspect to look at (for production use anyways).
> >>>>>
> >>>>> But I'd also be curious about real world feedback.
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>> Kris
> >>>>>
> >>>>> The 09/03/2015 16:01, Gurvinder Singh wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I am wondering if anybody in the community is running ceph
> >>>>>> cluster with high density machines e.g. Supermicro
> >>>>>> SYS-F618H-OSD288P (288 TB), Supermicro SSG-6048R-OSD432 (432
> TB)
> >>>>>> or some other high density machines. I am assuming that the
> >>>>>> installation will be of petabyte scale as you would want to have at
> least 3 of these boxes.
> >>>>>>
> >>>>>> It would be good to hear their experiences in terms of
> >>>>>> reliability, performance (specially during node failures). As
> >>>>>> these machines have 40Gbit network connection it can be ok, but
> >>>>>> experience from real users would be  great to hear. As these are
> >>>>>> mentioned in the reference architecture published by red hat and
> supermicro.
> >>>>>>
> >>>>>> Thanks for your time.
> >>>>>> _______________________________________________
> >>>>>> ceph-users mailing list
> >>>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>
> >>>>> De informatie verzonden met dit e-mailbericht is uitsluitend
> >>>>> bestemd voor de geadresseerde. Gebruik van deze informatie door
> >>>>> anderen dan de geadresseerde is uitdrukkelijk verboden. Indien u
> >>>>> dit bericht per vergissing heeft ontvangen, verzoeken wij u ons
> >>>>> onmiddelijk hiervan op de hoogte te stellen en het bericht te
> vernietigen.
> >>>>> Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking
> >>>>> van deze informatie aan derden is niet toegestaan. Bol.com b.v.
> >>>>> staat niet in voor de juiste en volledige overbrenging van de
> >>>>> inhoud van een verzonden e-mail, noch voor tijdige ontvangst
> >>>>> daarvan en aanvaardt geen aansprakelijkheid in dezen.
> >>>>> The information contained in this communication is confidential
> >>>>> and may be legally privileged. It is intended solely for the use
> >>>>> of the individual or entity to whom it is addressed and others
> >>>>> authorised to receive it. If you are not the intended recipient
> >>>>> please notify the sender and destroy this message. Any disclosure,
> >>>>> copying, distribution or taking any action in reliance on the
> >>>>> contents of this information is strictly prohibited and may be
unlawful.
> Bol.com b.v.
> >>>>> is neither liable for the proper and complete transmission of the
> >>>>> information contained in this communication nor for delay in its
> >>>>> receipt.
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@xxxxxxxxxxxxxx
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com