Re: high density machines

Gurvinder Singh <gurvindersinghdahiya@xxxxxxxxx> · Fri, 4 Sep 2015 09:57:08 +0200



On 09/04/2015 02:31 AM, Wang, Warren wrote:
> In the minority on this one. We have a number of the big SM 72 drive units w/ 40 Gbe. Definitely not as fast as even the 36 drive units, but it isn't awful for our average mixed workload. We can exceed all available performance with some workloads though.
> 
> So while we can't extract all the performance out of the box, as long as we don't max out on performance, the cost is very appealing,
I am wondering how much the cost difference you have seen with SM 72
drive compare to lets say
http://www.supermicro.com/products/system/1U/6017/SYS-6017R-73THDP_.cfm
or any other smaller machine which you have compared with. As with the
discussion on this thread it is clear that the 72 drive box are actually
4 * 18 drive boxes sharing power and cooling. Regarding performance I
think network might be bottleneck (may be cpu too), as it is 40 Gbit for
whole box so you get 10 Gbit per box (18 drives each) which can be peaked.

Gurvinder
 and as far as filling a unit, I'm not sure how many folks have filled
big prod clusters, but you really don't want them even running into the
70+% range due to some inevitable uneven filling, and room for failure.
> 
> Also, I'm betting that Ceph will continue to optimize things like the messenger, and reduce some of the massive CPU and TCP overhead, so we can claw back performance. I would love to see a thread count reduction. These can see over 130K threads per box.
> 
> Warren
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Mark Nelson
> Sent: Thursday, September 03, 2015 3:58 PM
> To: Gurvinder Singh <gurvindersinghdahiya@xxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  high density machines
> 
> 
> 
> On 09/03/2015 02:49 PM, Gurvinder Singh wrote:
>> Thanks everybody for the feedback.
>> On 09/03/2015 05:09 PM, Mark Nelson wrote:
>>> My take is that you really only want to do these kinds of systems if 
>>> you have massive deployments.  At least 10 of them, but probably more 
>>> like
>>> 20-30+.  You do get massive density with them, but I think if you are
>>> considering 5 of these, you'd be better off with 10 of the 36 drive 
>>> units.  An even better solution might be ~30-40 of these:
>>>
>>> http://www.supermicro.com/products/system/1U/6017/SYS-6017R-73THDP_.c
>>> fm
>>>
>> This one does look interesting.
>>> An extremely compelling solution would be if they took this system:
>>>
>>> http://www.supermicro.com/products/system/1U/5018/SSG-5018A-AR12L.cfm
>>> ?parts=SHOW
>>>
>> This one can be really good solution for archiving purpose with 
>> replaced CPU to get more juice into it.
>>>
>>> and replaced the C2750 with a Xeon-D 1540 (but keep the same number 
>>> of SATA ports).
>>>
>>> Potentially you could have:
>>>
>>> - 8x 2.0GHz Xeon Broadwell-DE Cores, 45W TDP
>>> - Up to 128GB RAM (32GB probably the sweet spot)
>>> - 2x 10GbE
>>> - 12x 3.5" spinning disks
>>> - single PCIe slot for PCIe SSD/NVMe
>> I am wondering does single PCIe SSD/NVMe device can support 12 OSDs 
>> journals and still perform the same as 4 OSD per SSD ?
> 
> Basically the limiting factor is how fast the device can do O_DSYNC writes.  We've seen that some PCIe SSD and NVME devices can do 1-2GB/s depending on the capacity which is enough to reasonably support 12-24 OSDs.  Whether or not it's good to have a single PCIe card to be a point of failure is a worthwhile topic (Probably only high write endurance cards should be considered).  There are plenty of other things that can bring the node down too though (motherboard, ram, cpu, etc) though.  A single node failure will also have less impact if there are lots of small nodes vs a couple big ones.
> 
>>>
>>> The density would be higher than the 36 drive units but lower than 
>>> the
>>> 72 drive units (though with shorter rack depth afaik).
>> You mean the 1U solution with 12 disk is longer in length than 72 disk 
>> 4U version ?
> 
> Sorry, the other way around I believe.
> 
>>
>> - Gurvinder
>>    Probably more
>>> CPU per OSD and far better distribution of OSDs across servers.  
>>> Given that the 10GbE and processor are embedded on the motherboard, 
>>> there's a decent chance these systems could be priced reasonably and 
>>> wouldn't have excessive power/cooling requirements.
>>>
>>> Mark
>>>
>>> On 09/03/2015 09:13 AM, Jan Schermer wrote:
>>>> It's not exactly a single system
>>>>
>>>> SSG-F618H-OSD288P*
>>>> 4U-FatTwin, 4x 1U 72TB per node, Ceph-OSD-Storage Node
>>>>
>>>> This could actually be pretty good, it even has decent CPU power.
>>>>
>>>> I'm not a big fan of blades and blade-like systems - sooner or later 
>>>> a backplane will die and you'll need to power off everything, which 
>>>> is a huge PITA.
>>>> But assuming you get 3 of these it could be pretty cool!
>>>> It would be interesting to have a price comparison to a SC216 
>>>> chassis or similiar, I'm afraid it won't be much cheaper.
>>>>
>>>> Jan
>>>>
>>>>> On 03 Sep 2015, at 16:09, Kris Gillespie <kgillespie@xxxxxxx> wrote:
>>>>>
>>>>> It's funny cause in my mind, such dense servers seems like a bad 
>>>>> idea to me for exactly the reason you mention, what if it fails. 
>>>>> Losing 400+TB of storage is going to have quite some impact, 40G 
>>>>> interfaces or not and no matter what options you tweak.
>>>>> Sure it'll be cost effective per TB, but that isn't the only aspect 
>>>>> to look at (for production use anyways).
>>>>>
>>>>> But I'd also be curious about real world feedback.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Kris
>>>>>
>>>>> The 09/03/2015 16:01, Gurvinder Singh wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am wondering if anybody in the community is running ceph cluster 
>>>>>> with high density machines e.g. Supermicro SYS-F618H-OSD288P (288 
>>>>>> TB), Supermicro SSG-6048R-OSD432 (432 TB) or some other high 
>>>>>> density machines. I am assuming that the installation will be of 
>>>>>> petabyte scale as you would want to have at least 3 of these boxes.
>>>>>>
>>>>>> It would be good to hear their experiences in terms of 
>>>>>> reliability, performance (specially during node failures). As 
>>>>>> these machines have 40Gbit network connection it can be ok, but 
>>>>>> experience from real users would be  great to hear. As these are 
>>>>>> mentioned in the reference architecture published by red hat and supermicro.
>>>>>>
>>>>>> Thanks for your time.
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> De informatie verzonden met dit e-mailbericht is uitsluitend 
>>>>> bestemd voor de geadresseerde. Gebruik van deze informatie door 
>>>>> anderen dan de geadresseerde is uitdrukkelijk verboden. Indien u 
>>>>> dit bericht per vergissing heeft ontvangen, verzoeken wij u ons 
>>>>> onmiddelijk hiervan op de hoogte te stellen en het bericht te vernietigen.
>>>>> Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking 
>>>>> van deze informatie aan derden is niet toegestaan. Bol.com b.v. 
>>>>> staat niet in voor de juiste en volledige overbrenging van de 
>>>>> inhoud van een verzonden e-mail, noch voor tijdige ontvangst 
>>>>> daarvan en aanvaardt geen aansprakelijkheid in dezen.
>>>>> The information contained in this communication is confidential and 
>>>>> may be legally privileged. It is intended solely for the use of the 
>>>>> individual or entity to whom it is addressed and others authorised 
>>>>> to receive it. If you are not the intended recipient please notify 
>>>>> the sender and destroy this message. Any disclosure, copying, 
>>>>> distribution or taking any action in reliance on the contents of 
>>>>> this information is strictly prohibited and may be unlawful. Bol.com b.v.
>>>>> is neither liable for the proper and complete transmission of the 
>>>>> information contained in this communication nor for delay in its 
>>>>> receipt.
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com