Hello, On Tue, 12 May 2015 15:28:28 +0000 Somnath Roy wrote: > Hi Christian, > Wonder why are you saying EC will write more data than replication ? There are 2 distinct things here to look at. 1. One is the overhead (increasing with smaller blocks) created by Ceph (and the filesystem) as per my link in the previous mail below. What I'm interested in is if that ratio is about the same on EC as with replication or if it is higher due to things like Local Recovery Codes. 2. Secondly, as you wrote in your reply to Nick EC will result in more throughput/bandwidth as it writes to more OSDs in parallel, just as good old RAID5/6. So for your test with rados bench it indeed writes less data to the OSDs and thus gets more speed. However I posit that with another test, like fio (which overwrites/updates an existing file) the nature of EC will result in many more writes (as it has to update the whole stripe) than a replica based pool. And that WA on top of everything caused by 1) is what would scare me with SSD backed OSDs. > Anyways, as you suggested, I will see how can I measure WA for EC vs > replication. > Thanks, take the above parts into consideration for that. Christian > Thanks & Regards > Somnath > > -----Original Message----- > From: Christian Balzer [mailto:chibi@xxxxxxx] > Sent: Monday, May 11, 2015 11:28 PM > To: ceph-users@xxxxxxxxxxxxxx > Cc: Somnath Roy; Loic Dachary (loic@xxxxxxxxxxx) > Subject: Re: EC backend benchmark > > > Hello, > > Could you have another EC run with differing block sizes like described > here: > http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043949.html > and look for write amplification? > > I'd suspect that by the very nature of EC and the addition local > checksums it (potentially) writes it to be worse than replication. > > Which is something very much to consider with SSDs. > > Christian > > On Mon, 11 May 2015 21:23:40 +0000 Somnath Roy wrote: > > > Hi Loic and community, > > > > I have gathered the following data on EC backend (all flash). I have > > decided to use Jerasure since space saving is the utmost priority. > > > > Setup: > > -------- > > 41 OSDs (each on 8 TB flash), 5 node Ceph cluster. 48 core HT enabled > > cpu/64 GB RAM. Tested with Rados Bench clients. > > > > Result: > > --------- > > > > It is attached in the doc. > > > > Summary : > > ------------- > > > > 1. It is doing pretty good in Reads and 4 Rados Bench clients are > > saturating 40 GB network. With more physical server, it is scaling > > almost linearly and saturating 40 GbE on both the host. > > > > 2. As suspected with Ceph, problem is again with writes. Throughput > > wise it is beating replicated pools in significant numbers. But, it is > > not scaling with multiple clients and not saturating anything. > > > > So, my question is the following. > > > > 1. Probably, nothing to do with EC backend, we are suffering because > > of filestore inefficiencies. Do you think any tunable like EC stipe > > size (or anything else) will help here ? > > > > 2. I couldn't make fault domain as 'host', because of HW limitation. > > Do you think will that play a role in performance for bigger k values ? > > > > 3. Even though it is not saturating 40 GbE for writes, do you think > > separating out public/private network will help in terms of > > performance ? > > > > Any feedback on this is much appreciated. > > > > Thanks & Regards > > Somnath > > > > > > > > ________________________________ > > > > PLEASE NOTE: The information contained in this electronic mail message > > is intended only for the use of the designated recipient(s) named > > above. If the reader of this message is not the intended recipient, > > you are hereby notified that you have received this message in error > > and that any review, dissemination, distribution, or copying of this > > message is strictly prohibited. If you have received this > > communication in error, please notify the sender by telephone or > > e-mail (as shown above) immediately and destroy any and all copies of > > this message in your possession (whether hard copies or electronically > > stored copies). > > > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Fusion Communications > http://www.gol.com/ > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message > is intended only for the use of the designated recipient(s) named above. > If the reader of this message is not the intended recipient, you are > hereby notified that you have received this message in error and that > any review, dissemination, distribution, or copying of this message is > strictly prohibited. If you have received this communication in error, > please notify the sender by telephone or e-mail (as shown above) > immediately and destroy any and all copies of this message in your > possession (whether hard copies or electronically stored copies). > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com