Re: Migrating large RGW buckets to erasure coding

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 7 Dec 2018 14:24:37 +0000 (UTC)

On Fri, 7 Dec 2018, Eric Goirand wrote:
> Hi Matthew,
> 
> There is a possibility related to your point #4 if your cluster has some extra
> space (there's probably a way to count what would be needed depending on the
> EC profile you use).
> 
> I would :
> 
> 1/ create the new EC pool => it corresponds to the RGW data pool only
> 
> 2/ create EC Ruleset associated to the pool
> 
> 3/ Change the current RGW data pool from using the former replica ruleset to
> the new EC ruleset.

Unfortunately Ceph doesn't know how to move data from a replica pool to an 
EC pool. Switching the CRUSH rule will just move the existing replicas to 
different OSDs.

s
> 
> That way there is no downtime and Ceph will migrate all the data itself from
> one pool to the other by rebalancing them.
> 
> You will however tweak the backfill parameter to go faster/slower depending on
> the impact on the clients.
> 
> You can always come back to previous configuration by re-associating the old
> ruleset to the RGW data pool.
> 
> With Regards,
> 
> Eric.
> 
> 
> On 12/4/18 6:33 PM, Matthew Vernon wrote:
> > Hi,
> > 
> > We're planning the move of our production Ceph cluster to Luminous (from
> > Jewel), and thinking about migrating the main S3/radosgw pool to
> > erasure-coding.
> > 
> > It's quite large (1.4PB / 440 M objects).
> > 
> > I've seen http://cephnotes.ksperis.com/blog/2015/04/15/ceph-pool-migration/
> > which seems to be the standard set of options:
> > 
> > 1) rados cppool
> > 
> > This would require a lot of downtime (since it just copies objects
> > one-by-one without checking if the object is already in the target pool),
> > and I don't know if the warning means this would break things anyway:
> > 
> > WARNING: pool copy does not preserve user_version, which some apps may rely
> > on.
> > 
> > 2) making a new pool as a cache tier
> > 
> > This looks pretty risky to me (and easy to get wrong), and I imagine the
> > cache-flush-evict-all stage is very IO intensive. Has anyone tried it with a
> > large pool?
> > 
> > 3) Rados export/import
> > 
> > I think this has the same problem as 1) above - having to do it as a big
> > bang, and needing a lot of temporary storage
> > 
> > 4) make a new pool, make it the default for rgw [not in above link]
> > 
> > This would, I think, mean that new uploads would go into the new pool, and
> > would be non-disruptive. But, AFAICS, there is no way to:
> > i) look at what % of a users' buckets/objects are in which pool
> > ii) migrate objects from old to new pool
> > 
> > The second is the kicker, really - it'd be very useful to be able to move
> > objects without having to download, remove, re-upload, but I don't think
> > there's an API to do this? Presumably you could hack something up based on
> > the internals of the radosgw itself, but that seems ... brave?
> > 
> > Surely we're not the only people to have wanted to do this; what have others
> > done?
> > 
> > Regards,
> > 
> > Matthew
> > 
> > 
> _______________________________________________
> Ceph-large mailing list
> Ceph-large@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com
> 
> 
> 
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com