Hi Somanth, Firstly, thank you for sharing these results. I suspect you are struggling to saturate anything due to the effects of serial latency, have you tried scaling the clients above 8? I noticed a similar ceiling at a much albeit at a much lower performance threshold when using 1Gb networking. I found as soon as I started approaching ~30-40MB/s with small IO's (64k), the utilisation on Network+Disks started increasing the average latency, so that there was diminishing returns in increasing the queue depth. By 50MB/s I couldn't seem to push anymore no matter what the queue depth was, of course larger object sizes allowed me to max out the 1Gb network. Upgrading to 10GB removed this bottleneck for smaller IO's due to lower latency. Is it possible you are starting to hit the same ceiling at 40Gb with 4MB IO's? Have you tried larger objects? Also Christian raises a very good point, there are a lot of extra writes that happen to the filestore than just the main data write. When you are doing <100iops these are insignificant, but they increasingly become more apparent as the number of IOPs goes up. Looking at the first couple of lines I can see that your latency nearly doubles as you double the number of clients, this probably has the effect of reducing the benefit that you would normally get of the extra clients. Also the fact that increasing the number of K chunks seems to help is probably because each OSD has less data to write so has a slightly lower latency, which means it can scale slightly better. I would imagine that at 16 clients, the 9 and 15 k chunks would probably scale a bit more whereas the 4 k and 6k chunks would still be stuck at the same speed. EC is probably quite similar to Raid, where latency is lowest when "queue depth<=Number of disks". It might be interesting to see the average latency of the SSD's via iostat to see if they are increasing as the number of clients goes up. One thing I did notice which confused me slightly, your latency figures don't seem to match the bandwidth you are getting. First line as an example:- Latency =0.5ms, which should give you about 2000 iops. 2000 iops at 4M block size should give you 8000MB/s Nick > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Somnath Roy > Sent: 12 May 2015 16:28 > To: Christian Balzer; ceph-users@xxxxxxxxxxxxxx > Subject: Re: EC backend benchmark > > Hi Christian, > Wonder why are you saying EC will write more data than replication ? > Anyways, as you suggested, I will see how can I measure WA for EC vs > replication. > > Thanks & Regards > Somnath > > -----Original Message----- > From: Christian Balzer [mailto:chibi@xxxxxxx] > Sent: Monday, May 11, 2015 11:28 PM > To: ceph-users@xxxxxxxxxxxxxx > Cc: Somnath Roy; Loic Dachary (loic@xxxxxxxxxxx) > Subject: Re: EC backend benchmark > > > Hello, > > Could you have another EC run with differing block sizes like described > here: > http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014- > October/043949.html > and look for write amplification? > > I'd suspect that by the very nature of EC and the addition local checksums it > (potentially) writes it to be worse than replication. > > Which is something very much to consider with SSDs. > > Christian > > On Mon, 11 May 2015 21:23:40 +0000 Somnath Roy wrote: > > > Hi Loic and community, > > > > I have gathered the following data on EC backend (all flash). I have > > decided to use Jerasure since space saving is the utmost priority. > > > > Setup: > > -------- > > 41 OSDs (each on 8 TB flash), 5 node Ceph cluster. 48 core HT enabled > > cpu/64 GB RAM. Tested with Rados Bench clients. > > > > Result: > > --------- > > > > It is attached in the doc. > > > > Summary : > > ------------- > > > > 1. It is doing pretty good in Reads and 4 Rados Bench clients are > > saturating 40 GB network. With more physical server, it is scaling > > almost linearly and saturating 40 GbE on both the host. > > > > 2. As suspected with Ceph, problem is again with writes. Throughput > > wise it is beating replicated pools in significant numbers. But, it is > > not scaling with multiple clients and not saturating anything. > > > > So, my question is the following. > > > > 1. Probably, nothing to do with EC backend, we are suffering because > > of filestore inefficiencies. Do you think any tunable like EC stipe > > size (or anything else) will help here ? > > > > 2. I couldn't make fault domain as 'host', because of HW limitation. > > Do you think will that play a role in performance for bigger k values ? > > > > 3. Even though it is not saturating 40 GbE for writes, do you think > > separating out public/private network will help in terms of performance ? > > > > Any feedback on this is much appreciated. > > > > Thanks & Regards > > Somnath > > > > > > > > ________________________________ > > > > PLEASE NOTE: The information contained in this electronic mail message > > is intended only for the use of the designated recipient(s) named above. > > If the reader of this message is not the intended recipient, you are > > hereby notified that you have received this message in error and that > > any review, dissemination, distribution, or copying of this message is > > strictly prohibited. If you have received this communication in error, > > please notify the sender by telephone or e-mail (as shown above) > > immediately and destroy any and all copies of this message in your > > possession (whether hard copies or electronically stored copies). > > > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Fusion Communications > http://www.gol.com/ > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly prohibited. If > you have received this communication in error, please notify the sender by > telephone or e-mail (as shown above) immediately and destroy any and all > copies of this message in your possession (whether hard copies or > electronically stored copies). > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com