Re: EC backend benchmark

Christian Balzer <chibi@xxxxxxx> · Wed, 13 May 2015 15:34:49 +0900

Hello,

On Wed, 13 May 2015 06:11:25 +0000 Somnath Roy wrote:

> Christian,
> EC pool is not supporting overwrites/partial writes and thus not
> supported (directly) with block/file interfaces. Did you put Cache tier
> in-front for your test with fio ?
> 
No, I never used EC and/or cache-tiers. 
The later I hope to rectify when the funding for a test/staging lab is
approved. ^o^

I assume that a full/busy enough cache-tier will result in enough cache
evictions/activity to cause some additional write-amplification down on the
EC pool level though.

As in, ideally all those tiny writes happen on the cache tier and later on
one big consolidated write goes out. Perfection.
But if the cache is too busy/small, it will have to write the backing
objects (stripes) multiple times, causing WA.

Regards,

Christian

> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Christian Balzer [mailto:chibi@xxxxxxx]
> Sent: Tuesday, May 12, 2015 6:55 PM
> To: ceph-users@xxxxxxxxxxxxxx
> Cc: Somnath Roy; Loic Dachary (loic@xxxxxxxxxxx)
> Subject: Re:  EC backend benchmark
> 
> 
> Hello,
> 
> On Tue, 12 May 2015 15:28:28 +0000 Somnath Roy wrote:
> 
> > Hi Christian,
> > Wonder why are you saying EC will write more data than replication ?
> 
> There are 2 distinct things here to look at.
> 
> 1. One is the overhead (increasing with smaller blocks) created by Ceph
> (and the filesystem) as per my link in the previous mail below. What I'm
> interested in is if that ratio is about the same on EC as with
> replication or if it is higher due to things like Local Recovery Codes.
> 
> 2. Secondly, as you wrote in your reply to Nick EC will result in more
> throughput/bandwidth as it writes to more OSDs in parallel, just as good
> old RAID5/6. So for your test with rados bench it indeed writes less
> data to the OSDs and thus gets more speed. However I posit that with
> another test, like fio (which overwrites/updates an existing file) the
> nature of EC will result in many more writes (as it has to update the
> whole stripe) than a replica based pool. And that WA on top of
> everything caused by 1) is what would scare me with SSD backed OSDs.
> 
> > Anyways, as you suggested, I will see how can I measure WA for EC vs
> > replication.
> >
> Thanks, take the above parts into consideration for that.
> 
> Christian
> > Thanks & Regards
> > Somnath
> >
> > -----Original Message-----
> > From: Christian Balzer [mailto:chibi@xxxxxxx]
> > Sent: Monday, May 11, 2015 11:28 PM
> > To: ceph-users@xxxxxxxxxxxxxx
> > Cc: Somnath Roy; Loic Dachary (loic@xxxxxxxxxxx)
> > Subject: Re:  EC backend benchmark
> >
> >
> > Hello,
> >
> > Could you have another EC run with differing block sizes like
> > described
> > here:
> > http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October
> > /043949.html
> > and look for write amplification?
> >
> > I'd suspect that by the very nature of EC and the addition local
> > checksums it (potentially) writes it to be worse than replication.
> >
> > Which is something very much to consider with SSDs.
> >
> > Christian
> >
> > On Mon, 11 May 2015 21:23:40 +0000 Somnath Roy wrote:
> >
> > > Hi Loic and community,
> > >
> > > I have gathered the following data on EC backend (all flash). I have
> > > decided to use Jerasure since space saving is the utmost priority.
> > >
> > > Setup:
> > > --------
> > > 41 OSDs (each on 8 TB flash), 5 node Ceph cluster. 48 core HT
> > > enabled
> > > cpu/64 GB RAM. Tested with Rados Bench clients.
> > >
> > > Result:
> > > ---------
> > >
> > > It is attached in the doc.
> > >
> > > Summary :
> > > -------------
> > >
> > > 1. It is doing pretty good in Reads and 4 Rados Bench clients are
> > > saturating 40 GB network. With more physical server, it is scaling
> > > almost linearly and saturating 40 GbE on both the host.
> > >
> > > 2. As suspected with Ceph, problem is again with writes. Throughput
> > > wise it is beating replicated pools in significant numbers. But, it
> > > is not scaling with multiple clients and not saturating anything.
> > >
> > > So, my question is the following.
> > >
> > > 1. Probably, nothing to do with EC backend, we are suffering because
> > > of filestore inefficiencies. Do you think any tunable like EC stipe
> > > size (or anything else) will help here ?
> > >
> > > 2. I couldn't make fault domain as 'host', because of HW limitation.
> > > Do you think will that play a role in performance for bigger k
> > > values ?
> > >
> > > 3. Even though it is not saturating 40 GbE for writes, do you think
> > > separating out public/private network will help in terms of
> > > performance ?
> > >
> > > Any feedback on this is much appreciated.
> > >
> > > Thanks & Regards
> > > Somnath
> > >
> > >
> > >
> > > ________________________________
> > >
> > > PLEASE NOTE: The information contained in this electronic mail
> > > message is intended only for the use of the designated recipient(s)
> > > named above. If the reader of this message is not the intended
> > > recipient, you are hereby notified that you have received this
> > > message in error and that any review, dissemination, distribution,
> > > or copying of this message is strictly prohibited. If you have
> > > received this communication in error, please notify the sender by
> > > telephone or e-mail (as shown above) immediately and destroy any and
> > > all copies of this message in your possession (whether hard copies
> > > or electronically stored copies).
> > >
> >
> >
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> >
> > ________________________________
> >
> > PLEASE NOTE: The information contained in this electronic mail message
> > is intended only for the use of the designated recipient(s) named
> > above. If the reader of this message is not the intended recipient,
> > you are hereby notified that you have received this message in error
> > and that any review, dissemination, distribution, or copying of this
> > message is strictly prohibited. If you have received this
> > communication in error, please notify the sender by telephone or
> > e-mail (as shown above) immediately and destroy any and all copies of
> > this message in your possession (whether hard copies or electronically
> > stored copies).
> >
> >
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message
> is intended only for the use of the designated recipient(s) named above.
> If the reader of this message is not the intended recipient, you are
> hereby notified that you have received this message in error and that
> any review, dissemination, distribution, or copying of this message is
> strictly prohibited. If you have received this communication in error,
> please notify the sender by telephone or e-mail (as shown above)
> immediately and destroy any and all copies of this message in your
> possession (whether hard copies or electronically stored copies).
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com