Re: Migrating large RGW buckets to erasure coding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tiering lifecycle would provide that, in development.

Yehuda

On Tue, Dec 4, 2018, 7:01 PM Sage Weil <sage@xxxxxxxxxxxx wrote:
On Tue, 4 Dec 2018, Aaron Bassett wrote:
> I'd probably do #4 myself.

I agree... 1-3 are all kludgey and I wouldn't trust the result.

I'm guessing we need some new RGW functionality to do the object migration
between tiers (unsure if this is already coming or not, copying Yehuda),
but even without that as a worst case I think you can simple GET and then
PUT each object to make it migrate.

sage


>
> Generate a list of all buckets/keys before switching over to the new storage and you have your todo list you can work against, just keep track of what you've done. This gives you a straightforward way to throttle the transfer as well, by scaling workers who are doing the down/rm/up cycles.  I'd think the most difficult part here would be managing the s3 credentials and ensuring clients are expecting per-key blips in availability.
>
> That said, I'd also be pretty tempted to shoot for bluestore at the same time if I was shuffling that much data anyways. It would complicate things, as you'd need to swap entire osds, not just pgs.
>
> FWIW my group has a tool we've released thats a nice cli wrapper around boto and makes things like multi part uploads and downloads as well as verifying uploads and downloads pretty straightforward. It also extends the default boto profile management a bit, making it easy if you have a bunch of keys you need to use different credentials on. I think using it to script up option #4 could make it pretty straightforward. It's available on github or in pypi:  https://github.com/bibby/radula  https://pypi.org/project/radula/
>
>
> Aaron
>
> On Dec 4, 2018, at 12:33 PM, Matthew Vernon <mv3@xxxxxxxxxxxx<mailto:mv3@xxxxxxxxxxxx>> wrote:
>
> Hi,
>
> We're planning the move of our production Ceph cluster to Luminous (from Jewel), and thinking about migrating the main S3/radosgw pool to erasure-coding.
>
> It's quite large (1.4PB / 440 M objects).
>
> I've seen https://urldefense.proofpoint.com/v2/url?u=http-3A__cephnotes.ksperis.com_blog_2015_04_15_ceph-2Dpool-2Dmigration_&d=DwICAg&c=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs&r=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw&m=DzdOAmSEvKFre1qigH-QrPc4ztoKcRXZ1nn2bVKA3d8&s=45z-gQuh9_K4BsKid68s03kFnKWo1_IUAMUme_vhuAw&e= which seems to be the standard set of options:
>
> 1) rados cppool
>
> This would require a lot of downtime (since it just copies objects one-by-one without checking if the object is already in the target pool), and I don't know if the warning means this would break things anyway:
>
> WARNING: pool copy does not preserve user_version, which some apps may rely on.
>
> 2) making a new pool as a cache tier
>
> This looks pretty risky to me (and easy to get wrong), and I imagine the cache-flush-evict-all stage is very IO intensive. Has anyone tried it with a large pool?
>
> 3) Rados export/import
>
> I think this has the same problem as 1) above - having to do it as a big bang, and needing a lot of temporary storage
>
> 4) make a new pool, make it the default for rgw [not in above link]
>
> This would, I think, mean that new uploads would go into the new pool, and would be non-disruptive. But, AFAICS, there is no way to:
> i) look at what % of a users' buckets/objects are in which pool
> ii) migrate objects from old to new pool
>
> The second is the kicker, really - it'd be very useful to be able to move objects without having to download, remove, re-upload, but I don't think there's an API to do this? Presumably you could hack something up based on the internals of the radosgw itself, but that seems ... brave?
>
> Surely we're not the only people to have wanted to do this; what have others done?
>
> Regards,
>
> Matthew
>
>
> --
> The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________
> Ceph-large mailing list
> Ceph-large@xxxxxxxxxxxxxx<mailto:Ceph-large@xxxxxxxxxxxxxx>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dlarge-2Dceph.com&d=DwICAg&c=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs&r=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw&m=DzdOAmSEvKFre1qigH-QrPc4ztoKcRXZ1nn2bVKA3d8&s=zVOd1WM9YORqck6rZNtl8XHkYlRiajLNoYldb0lmuC8&e=
>
> CONFIDENTIALITY NOTICE
> This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.
>
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFS]

  Powered by Linux