Re: Upcoming Erasure coding

Christian Balzer <chibi@xxxxxxx> · Tue, 24 Dec 2013 17:34:31 +0900

Hello Loic,

On Tue, 24 Dec 2013 08:29:38 +0100 Loic Dachary wrote:

> 
> 
> On 24/12/2013 05:42, Christian Balzer wrote:
> > 
> > Hello,
> > 
> > from what has been written on the roadmap page and here, I assume that
> > the erasure coding option with Firefly will be (unsurprisingly) a pool
> > option.
> 
> Hi Christian,
> 
> You are correct. It is set when a pool of type "erasure" is created for
> instance :
> 
>    ceph osd pool create poolname 12 12 erasure
> 
> creates an erasure pool "poolname" with 12 pg and uses the default
> erasure code plugin ( jerasure ) with parameters K=6, M=2 meaning each
> object is spread over 6+2 OSDs and you can sustain the loss of two OSDs.
Thanks for that info. 
I'm sure it will not use OSDs on the same server(s) if possible. 
Will it attempt to use distribute those 8 OSDs amongst failure domains, as
in, put it on 8 servers if those are available, use different racks if
they are available, etc. ?

> It can be changed with
> 
>    ceph osd pool create poolname 12 12 erasure erasure-code-k=2
> erasure-code-m=1 
> 
> which is the equivalent of having 2 replicas using 1.5 times the space
> instead of 2 times.
> 
Neat. ^.^

> > 
> > Given the nature of this beast I doubt that it can just be switched on
> > with a live pool, right?
> 
> Yes. 
> > 
> > If so, what are the thoughts/plans to allow for a seamless and
> > transparent migration, other than a "deploy more hardware, create a
> > new pool, migrate everything by hand (with potential service
> > interruptions)" approach?
> > 
> 
> One possibility is to use tiering. An erasure code pool is created and
> set to receive objects demoted from the replica pool when they have not
> been used in a long time. If the object is accessed from the replica
> pool, it is first promoted back to it and this is transparent to the
> user ( modulo the delay of promoting it when accessed again ).
>
Ah, but that sounds a lot like my proposal, w/o the benefit of being to
recycle (reconfigure) your old pool/hardware in the end.

Lets assume a Ceph cluster with OSDs being already quite full and maybe
hundreds of of VMs using RBD images on it. 
Migration your way wouldn't really improve things (storage density) much.
While in my way you get to fondle the pool name for each VM as you migrate
them, which won't be a live migration either.

I guess people with use this feature only with new pools/deployments in
many cases then.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com