Re: New cluster - configuration tips and reccomendation - NVMe

Wido den Hollander <wido@xxxxxxxx> · Thu, 6 Jul 2017 18:35:44 +0200 (CEST)

> Op 6 juli 2017 om 18:27 schreef Massimiliano Cuttini <max@xxxxxxxxxxxxx>:
> 
> 
> WOW!
> 
> Thanks to everybody!
> A tons of suggestion and good tips!
> 
> At the moment we are already using 100Gb/s cards and we are already 
> adopted 100Gb/s switch so we can go with 40Gb/s that are fully 
> compatible with our SWITCH.
> About CPU I was wrong, the model that we are seeing is not 2603 but 2630 
> which is quite different.
> Bad mistake!
> 
> This processor have 10 cores and 2.20GHz.
> I think it's the best price/quality by intel.
> 
> About that it seems that most of your reccomendation goes in the 
> direction to have less core but much faster speed.
> Is this right? So having 10 cores is not as good as having a faster one?

Partially. Make sure you have at least one physical CPU core per OSD.

And then the Ghz starts counting if you really want to push IOps. Especially over NVMe. You will need very fast CPUs to fully utilize those cards.

Wido

> 
> 
> 
> Il 05/07/2017 12:51, Wido den Hollander ha scritto:
> >> Op 5 juli 2017 om 12:39 schreef ceph@xxxxxxxxxxxxxx:
> >>
> >>
> >> Beware, a single 10G NIC is easily saturated by a single NVMe device
> >>
> > Yes, it is. But that what was what I'm pointing at. Bandwidth is usually not a problem, latency is.
> >
> > Take a look at a Ceph cluster running out there, it is probably doing a lot of IOps, but not that much bandwidth.
> >
> > A production cluster I took a look at:
> >
> > "client io 405 MB/s rd, 116 MB/s wr, 12211 op/s rd, 13272 op/s wr"
> >
> > This cluster is 15 machines with 10 OSDs (SSD, PM863a) each.
> >
> > So 405/15 = 27MB/sec
> >
> > It's doing 13k IOps now, that increases to 25k during higher load, but the bandwidth stays below 500MB/sec in TOTAL.
> >
> > So yes, you are right, a NVMe device can sature a single NIC, but most of the time latency and IOps are what count. Not bandwidth.
> >
> > Wido
> >
> >> On 05/07/2017 11:54, Wido den Hollander wrote:
> >>>> Op 5 juli 2017 om 11:41 schreef "Van Leeuwen, Robert" <rovanleeuwen@xxxxxxxx>:
> >>>>
> >>>>
> >>>> Hi Max,
> >>>>
> >>>> You might also want to look at the PCIE lanes.
> >>>> I am not an expert on the matter but my guess would be the 8 NVME drives + 2x100Gbit would be too much for
> >>>> the current Xeon generation (40 PCIE lanes) to fully utilize.
> >>>>
> >>> Fair enough, but you might want to think about if you really, really need 100Gbit. Those cards are expensive, same goes for the Gbics and switches.
> >>>
> >>> Storage is usually latency bound and not so much bandwidth. Imho a lot of people focus on raw TBs and bandwidth, but in the end IOps and latency are what usually matters.
> >>>
> >>> I'd probably stick with 2x10Gbit for now and use the money I saved on more memory and faster CPUs.
> >>>
> >>> Wido
> >>>
> >>>> I think the upcoming AMD/Intel offerings will improve that quite a bit so you may want to wait for that.
> >>>> As mentioned earlier. Single Core cpu speed matters for latency so you probably want to up that.
> >>>>
> >>>> You can also look at the DIMM configuration.
> >>>> TBH I am not sure how much it impacts Ceph performance but having just 2 DIMMS slots populated will not give you max memory bandwidth.
> >>>> Having some extra memory for read-cache probably won’t hurt either (unless you know your workload won’t include any cacheable reads)
> >>>>
> >>>> Cheers,
> >>>> Robert van Leeuwen
> >>>>
> >>>> From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Massimiliano Cuttini <max@xxxxxxxxxxxxx>
> >>>> Organization: PhoenixWeb Srl
> >>>> Date: Wednesday, July 5, 2017 at 10:54 AM
> >>>> To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
> >>>> Subject:  New cluster - configuration tips and reccomendation - NVMe
> >>>>
> >>>>
> >>>> Dear all,
> >>>>
> >>>> luminous is coming and sooner we should be allowed to avoid double writing.
> >>>> This means use 100% of the speed of SSD and NVMe.
> >>>> Cluster made all of SSD and NVMe will not be penalized and start to make sense.
> >>>>
> >>>> Looking forward I'm building the next pool of storage which we'll setup on next term.
> >>>> We are taking in consideration a pool of 4 with the following single node configuration:
> >>>>
> >>>>    *   2x E5-2603 v4 - 6 cores - 1.70GHz
> >>>>    *   2x 32Gb of RAM
> >>>>    *   2x NVMe M2 for OS
> >>>>    *   6x NVMe U2 for OSD
> >>>>    *   2x 100Gib ethernet cards
> >>>>
> >>>> We have yet not sure about which Intel and how much RAM we should put on it to avoid CPU bottleneck.
> >>>> Can you help me to choose the right couple of CPU?
> >>>> Did you see any issue on the configuration proposed?
> >>>>
> >>>> Thanks,
> >>>> Max
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@xxxxxxxxxxxxxx
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com