Re: Migrating large RGW buckets to erasure coding

Aaron Bassett <Aaron.Bassett@xxxxxxxxxxxxx> · Tue, 4 Dec 2018 17:55:54 +0000

I'd probably do #4 myself.

Generate a list of all buckets/keys before switching over to the new storage and you have your todo list you can work against, just keep track of what you've done. This gives you a straightforward way to throttle the transfer as well, by scaling
 workers who are doing the down/rm/up cycles.  I'd think the most difficult part here would be managing the s3 credentials and ensuring clients are expecting per-key blips in availability.

That said, I'd also be pretty tempted to shoot for bluestore at the same time if I was shuffling that much data anyways. It would complicate things, as you'd need to swap entire osds, not just pgs.

FWIW my group has a tool we've released thats a nice cli wrapper around boto and makes things like multi part uploads and downloads as well as verifying uploads and downloads pretty straightforward. It also extends the default boto profile management
 a bit, making it easy if you have a bunch of keys you need to use different credentials on. I think using it to script up option #4 could make it pretty straightforward. It's available on github or in pypi:  https://github.com/bibby/radula  https://pypi.org/project/radula/

Aaron 

On Dec 4, 2018, at 12:33 PM, Matthew Vernon <mv3@xxxxxxxxxxxx> wrote:

Hi,

We're planning the move of our production Ceph cluster to Luminous (from Jewel), and thinking about migrating the main S3/radosgw pool to erasure-coding.

It's quite large (1.4PB / 440 M objects).

I've seen 
https://urldefense.proofpoint.com/v2/url?u=http-3A__cephnotes.ksperis.com_blog_2015_04_15_ceph-2Dpool-2Dmigration_&d=DwICAg&c=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs&r=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw&m=DzdOAmSEvKFre1qigH-QrPc4ztoKcRXZ1nn2bVKA3d8&s=45z-gQuh9_K4BsKid68s03kFnKWo1_IUAMUme_vhuAw&e=
 which seems to be the standard set of options:

1) rados cppool

This would require a lot of downtime (since it just copies objects one-by-one without checking if the object is already in the target pool), and I don't know if the warning means this would break things anyway:

WARNING: pool copy does not preserve user_version, which some apps may rely on.

2) making a new pool as a cache tier

This looks pretty risky to me (and easy to get wrong), and I imagine the cache-flush-evict-all stage is very IO intensive. Has anyone tried it with a large pool?

3) Rados export/import

I think this has the same problem as 1) above - having to do it as a big bang, and needing a lot of temporary storage

4) make a new pool, make it the default for rgw [not in above link]

This would, I think, mean that new uploads would go into the new pool, and would be non-disruptive. But, AFAICS, there is no way to:

i) look at what % of a users' buckets/objects are in which pool

ii) migrate objects from old to new pool

The second is the kicker, really - it'd be very useful to be able to move objects without having to download, remove, re-upload, but I don't think there's an API to do this? Presumably you could hack something up based on the internals of the radosgw itself,
 but that seems ... brave?

Surely we're not the only people to have wanted to do this; what have others done?

Regards,

Matthew

-- 

The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________

Ceph-large mailing list

Ceph-large@xxxxxxxxxxxxxx

https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dlarge-2Dceph.com&d=DwICAg&c=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs&r=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw&m=DzdOAmSEvKFre1qigH-QrPc4ztoKcRXZ1nn2bVKA3d8&s=zVOd1WM9YORqck6rZNtl8XHkYlRiajLNoYldb0lmuC8&e=

CONFIDENTIALITY NOTICE

This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution
 or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.

_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com