Re: Upcoming Erasure coding

Wido den Hollander <wido@xxxxxxxx> · Wed, 25 Dec 2013 11:24:12 +0100

On 12/24/2013 11:16 PM, Mark Kirkwood wrote:
On 25/12/13 04:33, Loic Dachary wrote:

On 24/12/2013 10:22, Wido den Hollander wrote:

IIRC Erasure Encoding doesn't work well with RBD, if it even works at
all due to the fact that you can't update a object, but you have to
completely rewrite the whole object.

So Erasure encoding works great with the RADOS Gateway, but it
doesn't with RBD or CephFS.

When using Erasure you should also be aware that recovery traffic can
be 10x the traffic of the traffic you would see with a replicated pool.

Wido

P.S.: Loic, please correct me if I'm wrong :)

You are correct : erasure code pools will not support all operations
at first. They will be suitable for use with the tiering scenario I
described. And most probably with the majority of operations done by
radosgw. But the lack of support for partial writes makes it
impossible to use it as an RBD pool.

That raises an interesting question : what would be the benefit of
having an erasure coded RBD pool instead of a replica RBD pool with an
erasure coded second tier ? In other words, is there a compelling
reason to want:

RBD => erasure coded pool

instead of

RBD => replica pool => erasure code pool

where the objects are automatically moved to the erasure code pool if
they are not used for more than X days.

I may have misunderstood this - but the re-write of entire object is at
the RADOS level right? So would be a rewrite of (say) an entire 4M chunk
of an RBD image if any part of that chunk needs a change.

Yes, it would. So if you want to update 1 byte of that object you would 
have to rewrite the entire 4MB which also involves reading all the 
chunks since you need to compute the parity again.

If so, it seems to me that such a design could still be workable for
write once, read lots (and maybe delete) workloads - e.g data
loading/analysis etc.

Write once, read many is a perfect use case for Erasure encoding. So for 
the RGW it's great. You might want to set some pools, like those holding 
the user information and bucket indexes as replicated pools, but the 
ones holding the actual objects might be Erasure encoded.

Wido

Regards

Mark
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com