Re: EC backend benchmark

Nick Fisk <nick@xxxxxxxxxx> · Tue, 12 May 2015 21:01:50 +0100

Hi Somanth,

Firstly, thank you for sharing these results.

I suspect you are struggling to saturate anything due to the effects of
serial latency, have you tried scaling the clients above 8? 

I noticed a similar ceiling at a much albeit at a much lower performance
threshold when using 1Gb networking. I found as soon as I started
approaching ~30-40MB/s with small IO's (64k), the utilisation on
Network+Disks started increasing the average latency, so that there was
diminishing returns in increasing the queue depth. By 50MB/s I couldn't seem
to push anymore no matter what the queue depth was, of course larger object
sizes allowed me to max out the 1Gb network. Upgrading to 10GB removed this
bottleneck for smaller IO's due to lower latency. Is it possible you are
starting to hit the same ceiling at 40Gb with 4MB IO's? Have you tried
larger objects?

Also Christian raises a very good point, there are a lot of extra writes
that happen to the filestore than just the main data write. When you are
doing <100iops these are insignificant, but they increasingly become more
apparent as the number of IOPs goes up.

Looking at the first couple of lines I can see that your latency nearly
doubles as you double the number of clients, this probably has the effect of
reducing the benefit that you would normally get of the extra clients. Also
the fact that increasing the number of K chunks seems to help is probably
because each OSD has less data to write so has a slightly lower latency,
which means it can scale slightly better. I would imagine that at 16
clients, the 9 and 15 k chunks would probably scale a bit more whereas the 4
k and 6k chunks would still be stuck at the same speed. EC is probably quite
similar to Raid, where latency is lowest when "queue depth<=Number of
disks".

It might be interesting to see the average latency of the SSD's via iostat
to see if they are increasing as the number of clients goes up.

One thing I did notice which confused me slightly, your latency figures
don't seem to match the bandwidth you are getting. First line as an
example:-

Latency =0.5ms, which should give you about 2000 iops.

2000 iops at 4M block size should give you 8000MB/s

Nick

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Somnath Roy
> Sent: 12 May 2015 16:28
> To: Christian Balzer; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  EC backend benchmark
> 
> Hi Christian,
> Wonder why are you saying EC will write more data than replication ?
> Anyways, as you suggested, I will see how can I measure WA for EC vs
> replication.
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Christian Balzer [mailto:chibi@xxxxxxx]
> Sent: Monday, May 11, 2015 11:28 PM
> To: ceph-users@xxxxxxxxxxxxxx
> Cc: Somnath Roy; Loic Dachary (loic@xxxxxxxxxxx)
> Subject: Re:  EC backend benchmark
> 
> 
> Hello,
> 
> Could you have another EC run with differing block sizes like described
> here:
> http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-
> October/043949.html
> and look for write amplification?
> 
> I'd suspect that by the very nature of EC and the addition local checksums
it
> (potentially) writes it to be worse than replication.
> 
> Which is something very much to consider with SSDs.
> 
> Christian
> 
> On Mon, 11 May 2015 21:23:40 +0000 Somnath Roy wrote:
> 
> > Hi Loic and community,
> >
> > I have gathered the following data on EC backend (all flash). I have
> > decided to use Jerasure since space saving is the utmost priority.
> >
> > Setup:
> > --------
> > 41 OSDs (each on 8 TB flash), 5 node Ceph cluster. 48 core HT enabled
> > cpu/64 GB RAM. Tested with Rados Bench clients.
> >
> > Result:
> > ---------
> >
> > It is attached in the doc.
> >
> > Summary :
> > -------------
> >
> > 1. It is doing pretty good in Reads and 4 Rados Bench clients are
> > saturating 40 GB network. With more physical server, it is scaling
> > almost linearly and saturating 40 GbE on both the host.
> >
> > 2. As suspected with Ceph, problem is again with writes. Throughput
> > wise it is beating replicated pools in significant numbers. But, it is
> > not scaling with multiple clients and not saturating anything.
> >
> > So, my question is the following.
> >
> > 1. Probably, nothing to do with EC backend, we are suffering because
> > of filestore inefficiencies. Do you think any tunable like EC stipe
> > size (or anything else) will help here ?
> >
> > 2. I couldn't make fault domain as 'host', because of HW limitation.
> > Do you think will that play a role in performance for bigger k values ?
> >
> > 3. Even though it is not saturating 40 GbE for writes, do you think
> > separating out public/private network will help in terms of performance
?
> >
> > Any feedback on this is much appreciated.
> >
> > Thanks & Regards
> > Somnath
> >
> >
> >
> > ________________________________
> >
> > PLEASE NOTE: The information contained in this electronic mail message
> > is intended only for the use of the designated recipient(s) named above.
> > If the reader of this message is not the intended recipient, you are
> > hereby notified that you have received this message in error and that
> > any review, dissemination, distribution, or copying of this message is
> > strictly prohibited. If you have received this communication in error,
> > please notify the sender by telephone or e-mail (as shown above)
> > immediately and destroy any and all copies of this message in your
> > possession (whether hard copies or electronically stored copies).
> >
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
the
> reader of this message is not the intended recipient, you are hereby
notified
> that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
prohibited. If
> you have received this communication in error, please notify the sender by
> telephone or e-mail (as shown above) immediately and destroy any and all
> copies of this message in your possession (whether hard copies or
> electronically stored copies).
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com