Re: Number of SSD for OSD journal

Christian Balzer <chibi@xxxxxxx> · Tue, 16 Dec 2014 14:06:33 +0900

Hello,

On Mon, 15 Dec 2014 22:43:14 +0100 Florent MONTHEL wrote:

> Thanks all
> 
> I will probably have 2x10gb : 1x10gb for client and 1x10gb for cluster
> but I take in charge your recommendation Sebastien
> 
> The 200GB SSD will probably give me around 500MB/s sequential bandwidth.
Intel DC S3700 200GB are 365MB/s write.

> So with only 2 SSD I can  overload 1x 10gb network.
> 
Unless you have an unlimited budget (which you obviously don't), you have
to balance cost and performance. 
Performance however comes in 2 main flavors here, IOPS and
throughput/bandwidth.

In normal operation, your storage nodes will run out of IOPS long before
they hit the bandwidth limits of your network or storage (both journal SSDs
and HDDs).

During a recovery or other data migration (new OSD) process the bandwidth
becomes a lot more relevant, especially if the cluster isn't otherwise
busy at that time.
Your configuration 1 won't be able to continuously scribble data to the
HDDs faster than the 730MB/s of the 2 Intel SSDs, so it's fine as long as
you keep in mind that one dead (or otherwise unavailable) SSD will take
out 5 OSDs.

Your configuration 2 and 3 will likely benefit from 4 SSDs, not just to
keep the failure domain to sane levels, but also because at least #3
should be able to write faster to the HDDs than 730MB/s. 

And let me chime in here with the "Intel DC S3700 SSDs for journals" crowd.
For a cluster here I wound up using 4 100GB ones (200MB/s write) and 8
HDDs, as that was still very affordable while reducing the failure domain
of one SSD to 2 OSDs.

Research the ML archives, but for your #2 and #3 you will also want PLENTY
of CPU power and RAM (page cache avoids a lot of disk seeks and speeds up
things massively on the read side).

Lastly, I have a SSD-less test cluster and can just nod emphatically to
what Craig wrote.

Christian

> Hum I will take care of osd density
> 
> Sent from my iPhone
> 
> > On 15 déc. 2014, at 21:45, Sebastien Han <sebastien.han@xxxxxxxxxxxx>
> > wrote:
> > 
> > Salut,
> > 
> > The general recommended ratio (for me at least) is 3 journals per SSD.
> > Using 200GB Intel DC S3700 is great. If you’re going with a low perf
> > scenario I don’t think you should bother buying SSD, just remove them
> > from the picture and do 12 SATA 7.2K 4TB.
> > 
> > For medium and medium ++ perf using a ratio 1:11 is way to high, the
> > SSD will definitely be the bottleneck here. Please also note that
> > (bandwidth wise) with 22 drives you’re already hitting the theoretical
> > limit of a 10Gbps network. (~50MB/s * 22 ~= 1.1Gbps). You can
> > theoretically up that value with LACP (depending on the
> > xmit_hash_policy you’re using of course).
> > 
> > Btw what’s the network? (since I’m only assuming here).
> > 
> > 
> >> On 15 Dec 2014, at 20:44, Florent MONTHEL <fmonthel@xxxxxxxxxxxxx>
> >> wrote:
> >> 
> >> Hi,
> >> 
> >> I’m buying several servers to test CEPH and I would like to configure
> >> journal on SSD drives (maybe it’s not necessary for all use cases)
> >> Could you help me to identify number of SSD I need (SSD are very
> >> expensive and GB price business case killer… ) ? I don’t want to
> >> experience SSD bottleneck (some abacus ?). I think I will be with
> >> below CONF 2 & 3
> >> 
> >> 
> >> CONF 1 DELL 730XC "Low Perf":
> >> 10 SATA 7.2K 3.5  4TB + 2 SSD 2.5 » 200GB "intensive write"
> >> 
> >> CONF 2 DELL 730XC « Medium Perf" :
> >> 22 SATA 7.2K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
> >> 
> >> CONF 3 DELL 730XC « Medium Perf ++" :
> >> 22 SAS 10K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
> >> 
> >> Thanks
> >> 
> >> Florent Monthel
> >> 
> >> 
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> > Cheers.
> > ––––
> > Sébastien Han
> > Cloud Architect
> > 
> > "Always give 100%. Unless you're giving blood."
> > 
> > Phone: +33 (0)1 49 70 99 72
> > Mail: sebastien.han@xxxxxxxxxxxx
> > Address : 11 bis, rue Roquépine - 75008 Paris
> > Web : www.enovance.com - Twitter : @enovance
> > 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com