Hello, On Tue, 24 Dec 2013 16:33:49 +0100 Loic Dachary wrote: > > > On 24/12/2013 10:22, Wido den Hollander wrote: > > On 12/24/2013 09:34 AM, Christian Balzer wrote: > >> > >> Hello Loic, > >> > >> On Tue, 24 Dec 2013 08:29:38 +0100 Loic Dachary wrote: > >> > >>> > >>> > >>> On 24/12/2013 05:42, Christian Balzer wrote: > >>>> > >>>> Hello, > >>>> > >>>> from what has been written on the roadmap page and here, I assume > >>>> that the erasure coding option with Firefly will be > >>>> (unsurprisingly) a pool option. > >>> > >>> Hi Christian, > >>> > >>> You are correct. It is set when a pool of type "erasure" is created > >>> for instance : > >>> > >>> ceph osd pool create poolname 12 12 erasure > >>> > >>> creates an erasure pool "poolname" with 12 pg and uses the default > >>> erasure code plugin ( jerasure ) with parameters K=6, M=2 meaning > >>> each object is spread over 6+2 OSDs and you can sustain the loss of > >>> two OSDs. > >> Thanks for that info. > >> I'm sure it will not use OSDs on the same server(s) if possible. > >> Will it attempt to use distribute those 8 OSDs amongst failure > >> domains, as in, put it on 8 servers if those are available, use > >> different racks if they are available, etc. ? > >> > >>> It can be changed with > >>> > >>> ceph osd pool create poolname 12 12 erasure erasure-code-k=2 > >>> erasure-code-m=1 > >>> > >>> which is the equivalent of having 2 replicas using 1.5 times the > >>> space instead of 2 times. > >>> > >> Neat. ^.^ > >> > > > > Don't get your hopes up, let me explain that below. > > > >>>> > >>>> Given the nature of this beast I doubt that it can just be switched > >>>> on with a live pool, right? > >>> > >>> Yes. > >>>> > >>>> If so, what are the thoughts/plans to allow for a seamless and > >>>> transparent migration, other than a "deploy more hardware, create a > >>>> new pool, migrate everything by hand (with potential service > >>>> interruptions)" approach? > >>>> > >>> > >>> One possibility is to use tiering. An erasure code pool is created > >>> and set to receive objects demoted from the replica pool when they > >>> have not been used in a long time. If the object is accessed from > >>> the replica pool, it is first promoted back to it and this is > >>> transparent to the user ( modulo the delay of promoting it when > >>> accessed again ). > >>> > >> Ah, but that sounds a lot like my proposal, w/o the benefit of being > >> to recycle (reconfigure) your old pool/hardware in the end. > >> > >> Lets assume a Ceph cluster with OSDs being already quite full and > >> maybe hundreds of of VMs using RBD images on it. > >> Migration your way wouldn't really improve things (storage density) > >> much. While in my way you get to fondle the pool name for each VM as > >> you migrate them, which won't be a live migration either. > >> > > > > IIRC Erasure Encoding doesn't work well with RBD, if it even works at > > all due to the fact that you can't update a object, but you have to > > completely rewrite the whole object. > > Ah yes, of course... > > So Erasure encoding works great with the RADOS Gateway, but it doesn't > > with RBD or CephFS. > > > > When using Erasure you should also be aware that recovery traffic can > > be 10x the traffic of the traffic you would see with a replicated pool. > > > > Wido > > > > P.S.: Loic, please correct me if I'm wrong :) > > You are correct : erasure code pools will not support all operations at > first. They will be suitable for use with the tiering scenario I > described. And most probably with the majority of operations done by > radosgw. But the lack of support for partial writes makes it impossible > to use it as an RBD pool. > Nods, w/o partial writes that would be very ugly indeed. > That raises an interesting question : what would be the benefit of > having an erasure coded RBD pool instead of a replica RBD pool with an > erasure coded second tier ? In other words, is there a compelling reason > to want: > > RBD => erasure coded pool > > instead of > > RBD => replica pool => erasure code pool > > where the objects are automatically moved to the erasure code pool if > they are not used for more than X days. > Now that I know about this limitation, your suggestion of a tiered erasure code pool makes of course all the sense in the world. I would assume that enough demoting and promoting would be going on to have a measurable effect, but of course that depends on the block allocation strategies of the VM (filesystem) in question. One guesses BTRFS would be the worst offender here with CoW. Thanks a lot for that info, however deflating of my hopes it was. ^o^ Christian > Cheers > > >> I guess people with use this feature only with new pools/deployments > >> in many cases then. > >> > >> Regards, > >> > >> Christian > >> > > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com