Re: Best layout for SSD & SAS OSDs

Christian Balzer <chibi@xxxxxxx> · Mon, 7 Sep 2015 19:19:18 +0900

On Mon, 7 Sep 2015 12:11:27 +0200 Jan Schermer wrote:

> Dense SSD nodes are not really an issue for network (unless you really
> use all the throughput), 
That's exactly what I wrote...
And dense in the sense of saturating his network would be 4 SSDs, so:

> the issue is with CPU and memory throughput
> (and possibly crappy kernel scheduler depending on how up-to-date distro
> you use). 
Thats what I wrote as well, which makes smaller nodes with more CPU
resources attractive. 

> Also if you want consistent performance even when failure
> occurs, you need to either have 100% reliable SSDs, or put them in RAID
> for the journals. You don't want to rebuild all those HDD OSDs. Losing a
> journal SSD is more likely than losing a HDD these days.
>
Say what?

My "Enterprise" HDDs are failing quite nicely, while I have yet to loose a
single Intel SSD, DC or otherwise.

Christian

> Jan
> 
> 
> > On 07 Sep 2015, at 05:53, Christian Balzer <chibi@xxxxxxx> wrote:
> > 
> > On Sat, 5 Sep 2015 07:13:29 -0300 German Anders wrote:
> > 
> >> Hi Christian,
> >> 
> >>    Ok so would said that it's better to rearrange the nodes so i dont
> >> mix the hdd and ssd disks right? And create high perf nodes with ssd
> >> and others with hdd, its fine since its a new deploy.
> >> 
> > It is what I would do, yes. 
> > However if you're limited to 7 nodes initially specialized/optimized
> > nodes might result in pretty small "subclusters" and thus relatively
> > large failure domains. 
> > 
> > If for example this cluster would consisted of 2 SSD and 5 HDD nodes,
> > loosing 1 of the SSD nodes would roughly halve your read speed from
> > that pool (while amusingly enough improve your write speed ^o^).
> > This is assuming a replication of 2 for SSD pools, which with DC SSDs
> > is a pretty safe choice.
> > 
> > Also dense SSDs nodes will be able to saturate your network easily, for
> > example 3-4 of the DC S3xxx SSDs will exceed the bandwidth of your
> > links. This is of course only an issue if you're actually expecting
> > huge amounts of reads/writes, as apposed to have lots of small
> > transactions that depend on low latency.
> > 
> >>   Also the nodes had different type of ram cpu, 4 had more cpu and
> >> more memory 384gb and other 3 had less cpu and 128gb of ram, so maybe
> >> i can put the ssd con the much more cpu nodes and left the hdd for
> >> the other nodes. 
> > 
> > I take it from this that you already have those machines?
> > Which number and models CPUs exactly?
> > 
> > What you want is as MUCH CPU power for any SSD node as possible, while
> > the HDD nodes will benefit mostly from more RAM (page cache).
> > 
> >> Network is going to be used infiniband fdr at 56gb/s on all the
> >> nodes for the publ network and for the clus network.
> >> 
> > Is this 1 interface for the public and 1 for the cluster network?
> > Note that with IPoIB (with Accelio not being ready yet) I'm seeing at
> > most 1.5GByte/s with QDR (40Gb/s).
> > 
> > If you were to start with a clean slate, I'd go with something like
> > this to achieve the storage capacity you outlined:
> > 
> > * 1-2 Quad node chassis like this with 4-6 SSD ODS per node and a 2nd
> > IB HCA, or a similar product w/o onboard IB and a 2 port IB HCA:
> > http://www.supermicro.com.tw/products/system/2U/2028/SYS-2028TP-HTFR.cfm
> > That will give you 4-8 high performance SSD nodes in 2-4U.
> > 
> > * 5 HDD storage nodes, with 8-10 HDDs and 2-4 journal SSDs like this:
> > http://www.supermicro.com.tw/products/system/2U/5028/SSG-5028R-E1CR12L.cfm
> > (4 100GB DC S3700 will perform better than 2 200GB ones and give you
> > smaller failure domains at about the same price).
> > 
> > Christian
> > 
> >>   Any other suggestion/comment?
> >> 
> >> Thanks a lot!
> >> 
> >> Best regards
> >> 
> >> German
> >> 
> >> 
> >> On Saturday, September 5, 2015, Christian Balzer <chibi@xxxxxxx>
> >> wrote:
> >> 
> >>> 
> >>> Hello,
> >>> 
> >>> On Fri, 4 Sep 2015 12:30:12 -0300 German Anders wrote:
> >>> 
> >>>> Hi cephers,
> >>>> 
> >>>>   I've the following scheme:
> >>>> 
> >>>> 7x OSD servers with:
> >>>> 
> >>> Is this a new cluster, total initial deployment?
> >>> 
> >>> What else are these nodes made of, CPU/RAM/network?
> >>> While uniform nodes have some appeal (interchangeability, one node
> >>> down does impact the cluster uniformly) they tend to be compromise
> >>> solutions. I personally would go with optimized HDD and SSD nodes.
> >>> 
> >>>>    4x 800GB SSD Intel DC S3510 (OSD-SSD)
> >>> Only 0.3DWPD, 450TB total in 5 years.
> >>> If you can correctly predict your write volume and it is below that
> >>> per SSD, fine. I'd use 3610s, with internal journals.
> >>> 
> >>>>    3x 120GB SSD Intel DC S3500 (Journals)
> >>> In this case even more so the S3500 is a bad choice. 3x 135MB/s is
> >>> nowhere near your likely network speed of 10Gb/s.
> >>> 
> >>> You will vastly superior performance and endurance with two 200GB
> >>> S3610 (2x 230MB/s) or S3700 (2x365 MB/s)
> >>> 
> >>> Why the uneven number of journals SSDs?
> >>> You want uniform utilization, wear. 2 journal SSDs for 6 HDDs would
> >>> be a good ratio.
> >>> 
> >>>>    5x 3TB SAS disks (OSD-SAS)
> >>>> 
> >>> See above, even numbers make a lot more sense.
> >>> 
> >>>> 
> >>>> The OSD servers are located on two separate Racks with two power
> >>>> circuits each.
> >>>> 
> >>>>   I would like to know what is the best way to implement this.. use
> >>>> the 4x 800GB SSD like a SSD-pool, or used them us a Cache pool? or
> >>>> any other suggestion? Also any advice for the crush design?
> >>>> 
> >>> Nick touched on that already, for right now SSD pools would be
> >>> definitely better.
> >>> 
> >>> Christian
> >>> --
> >>> Christian Balzer        Network/Systems Engineer
> >>> chibi@xxxxxxx <javascript:;>        Global OnLine Japan/Fusion
> >>> Communications
> >>> http://www.gol.com/
> >>> 
> >> 
> >> 
> > 
> > 
> > -- 
> > Christian Balzer        Network/Systems Engineer                
> > chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com