Re: Best layout for SSD & SAS OSDs

Jan Schermer <jan@xxxxxxxxxxx> · Mon, 7 Sep 2015 12:11:27 +0200

Dense SSD nodes are not really an issue for network (unless you really use all the throughput), the issue is with CPU and memory throughput (and possibly crappy kernel scheduler depending on how up-to-date distro you use).
Also if you want consistent performance even when failure occurs, you need to either have 100% reliable SSDs, or put them in RAID for the journals. You don't want to rebuild all those HDD OSDs. Losing a journal SSD is more likely than losing a HDD these days.

Jan

> On 07 Sep 2015, at 05:53, Christian Balzer <chibi@xxxxxxx> wrote:
> 
> On Sat, 5 Sep 2015 07:13:29 -0300 German Anders wrote:
> 
>> Hi Christian,
>> 
>>    Ok so would said that it's better to rearrange the nodes so i dont
>> mix the hdd and ssd disks right? And create high perf nodes with ssd and
>> others with hdd, its fine since its a new deploy.
>> 
> It is what I would do, yes. 
> However if you're limited to 7 nodes initially specialized/optimized nodes
> might result in pretty small "subclusters" and thus relatively large
> failure domains. 
> 
> If for example this cluster would consisted of 2 SSD and 5 HDD nodes,
> loosing 1 of the SSD nodes would roughly halve your read speed from that
> pool (while amusingly enough improve your write speed ^o^).
> This is assuming a replication of 2 for SSD pools, which with DC SSDs is a
> pretty safe choice.
> 
> Also dense SSDs nodes will be able to saturate your network easily, for
> example 3-4 of the DC S3xxx SSDs will exceed the bandwidth of your links.
> This is of course only an issue if you're actually expecting huge amounts
> of reads/writes, as apposed to have lots of small transactions that depend
> on low latency.
> 
>>   Also the nodes had different type of ram cpu, 4 had more cpu and more
>> memory 384gb and other 3 had less cpu and 128gb of ram, so maybe i can
>> put the ssd con the much more cpu nodes and left the hdd for the other
>> nodes. 
> 
> I take it from this that you already have those machines?
> Which number and models CPUs exactly?
> 
> What you want is as MUCH CPU power for any SSD node as possible, while the
> HDD nodes will benefit mostly from more RAM (page cache).
> 
>> Network is going to be used infiniband fdr at 56gb/s on all the
>> nodes for the publ network and for the clus network.
>> 
> Is this 1 interface for the public and 1 for the cluster network?
> Note that with IPoIB (with Accelio not being ready yet) I'm seeing at most
> 1.5GByte/s with QDR (40Gb/s).
> 
> If you were to start with a clean slate, I'd go with something like this
> to achieve the storage capacity you outlined:
> 
> * 1-2 Quad node chassis like this with 4-6 SSD ODS per node and a 2nd IB
> HCA, or a similar product w/o onboard IB and a 2 port IB HCA:
> http://www.supermicro.com.tw/products/system/2U/2028/SYS-2028TP-HTFR.cfm
> That will give you 4-8 high performance SSD nodes in 2-4U.
> 
> * 5 HDD storage nodes, with 8-10 HDDs and 2-4 journal SSDs like this:
> http://www.supermicro.com.tw/products/system/2U/5028/SSG-5028R-E1CR12L.cfm
> (4 100GB DC S3700 will perform better than 2 200GB ones and give you
> smaller failure domains at about the same price).
> 
> Christian
> 
>>   Any other suggestion/comment?
>> 
>> Thanks a lot!
>> 
>> Best regards
>> 
>> German
>> 
>> 
>> On Saturday, September 5, 2015, Christian Balzer <chibi@xxxxxxx> wrote:
>> 
>>> 
>>> Hello,
>>> 
>>> On Fri, 4 Sep 2015 12:30:12 -0300 German Anders wrote:
>>> 
>>>> Hi cephers,
>>>> 
>>>>   I've the following scheme:
>>>> 
>>>> 7x OSD servers with:
>>>> 
>>> Is this a new cluster, total initial deployment?
>>> 
>>> What else are these nodes made of, CPU/RAM/network?
>>> While uniform nodes have some appeal (interchangeability, one node down
>>> does impact the cluster uniformly) they tend to be compromise
>>> solutions. I personally would go with optimized HDD and SSD nodes.
>>> 
>>>>    4x 800GB SSD Intel DC S3510 (OSD-SSD)
>>> Only 0.3DWPD, 450TB total in 5 years.
>>> If you can correctly predict your write volume and it is below that per
>>> SSD, fine. I'd use 3610s, with internal journals.
>>> 
>>>>    3x 120GB SSD Intel DC S3500 (Journals)
>>> In this case even more so the S3500 is a bad choice. 3x 135MB/s is
>>> nowhere near your likely network speed of 10Gb/s.
>>> 
>>> You will vastly superior performance and endurance with two 200GB S3610
>>> (2x 230MB/s) or S3700 (2x365 MB/s)
>>> 
>>> Why the uneven number of journals SSDs?
>>> You want uniform utilization, wear. 2 journal SSDs for 6 HDDs would be
>>> a good ratio.
>>> 
>>>>    5x 3TB SAS disks (OSD-SAS)
>>>> 
>>> See above, even numbers make a lot more sense.
>>> 
>>>> 
>>>> The OSD servers are located on two separate Racks with two power
>>>> circuits each.
>>>> 
>>>>   I would like to know what is the best way to implement this.. use
>>>> the 4x 800GB SSD like a SSD-pool, or used them us a Cache pool? or
>>>> any other suggestion? Also any advice for the crush design?
>>>> 
>>> Nick touched on that already, for right now SSD pools would be
>>> definitely better.
>>> 
>>> Christian
>>> --
>>> Christian Balzer        Network/Systems Engineer
>>> chibi@xxxxxxx <javascript:;>        Global OnLine Japan/Fusion
>>> Communications
>>> http://www.gol.com/
>>> 
>> 
>> 
> 
> 
> -- 
> Christian Balzer        Network/Systems Engineer                
> chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com