Re: RGW: Migrating a long-lived cluster to multi-site, fixing an EC pool mistake

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



True, good luck with that, its kind of a tedious process that takes just
too long time :(

Nino


On Sat, Jun 17, 2023 at 7:48 AM Christian Theune <ct@xxxxxxxxxxxxxxx> wrote:

> What got lost is that I need to change the pool’s m/k parameters, which is
> only possible by creating a new pool and moving all data from the old pool.
> Changing the crush rule doesn’t allow you to do that.
>
> > On 16. Jun 2023, at 23:32, Nino Kotur <ninokotur@xxxxxxxxx> wrote:
> >
> > If you create new crush rule for ssd/nvme/hdd and attach it to existing
> pool you should be able to do the migration seamlessly while everything is
> online... However impact to user will depend on storage devices load and
> network utilization as it will create chaos on cluster network.
> >
> > Or did i get something wrong?
> >
> >
> >
> >
> > Kind regards,
> > Nino
> >
> >
> > On Wed, Jun 14, 2023 at 5:44 PM Christian Theune <ct@xxxxxxxxxxxxxxx>
> wrote:
> > Hi,
> >
> > further note to self and for posterity … ;)
> >
> > This turned out to be a no-go as well, because you can’t silently switch
> the pools to a different storage class: the objects will be found, but the
> index still refers to the old storage class and lifecycle migrations won’t
> work.
> >
> > I’ve brainstormed for further options and it appears that the last
> resort is to use placement targets and copy the buckets explicitly - twice,
> because on Nautilus I don’t have renames available, yet. :(
> >
> > This will require temporary downtimes prohibiting users to access their
> bucket. Fortunately we only have a few very large buckets (200T+) that will
> take a while to copy. We can pre-sync them of course, so the downtime will
> only be during the second copy.
> >
> > Christian
> >
> > > On 13. Jun 2023, at 14:52, Christian Theune <ct@xxxxxxxxxxxxxxx>
> wrote:
> > >
> > > Following up to myself and for posterity:
> > >
> > > I’m going to try to perform a switch here using (temporary) storage
> classes and renaming of the pools to ensure that I can quickly change the
> STANDARD class to a better EC pool and have new objects located there.
> After that we’ll add (temporary) lifecycle rules to all buckets to ensure
> their objects will be migrated to the STANDARD class.
> > >
> > > Once that is finished we should be able to delete the old pool and the
> temporary storage class.
> > >
> > > First tests appear successfull, but I’m a bit struggling to get the
> bucket rules working (apparently 0 days isn’t a real rule … and the debug
> interval setting causes high frequent LC runs but doesn’t seem move objects
> just yet. I’ll play around with that setting a bit more, though, I think I
> might have tripped something that only wants to process objects every so
> often and on an interval of 10 a day is still 2.4 hours …
> > >
> > > Cheers,
> > > Christian
> > >
> > >> On 9. Jun 2023, at 11:16, Christian Theune <ct@xxxxxxxxxxxxxxx>
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> we are running a cluster that has been alive for a long time and we
> tread carefully regarding updates. We are still a bit lagging and our
> cluster (that started around Firefly) is currently at Nautilus. We’re
> updating and we know we’re still behind, but we do keep running into
> challenges along the way that typically are still unfixed on main and - as
> I started with - have to tread carefully.
> > >>
> > >> Nevertheless, mistakes happen, and we found ourselves in this
> situation: we converted our RGW data pool from replicated (n=3) to erasure
> coded (k=10, m=3, with 17 hosts) but when doing the EC profile selection we
> missed that our hosts are not evenly balanced (this is a growing cluster
> and some machines have around 20TiB capacity for the RGW data pool, wheres
> newer machines have around 160TiB and we rather should have gone with k=4,
> m=3.  In any case, having 13 chunks causes too many hosts to participate in
> each object. Going for k+m=7 will allow distribution to be more effective
> as we have 7 hosts that have the 160TiB sizing.
> > >>
> > >> Our original migration used the “cache tiering” approach, but that
> only works once when moving from replicated to EC and can not be used for
> further migrations.
> > >>
> > >> The amount of data is at 215TiB somewhat significant, so using an
> approach that scales when copying data[1] to avoid ending up with months of
> migration.
> > >>
> > >> I’ve run out of ideas doing this on a low-level (i.e. trying to fix
> it on a rados/pool level) and I guess we can only fix this on an
> application level using multi-zone replication.
> > >>
> > >> I have the setup nailed in general, but I’m running into issues with
> buckets in our staging and production environment that have
> `explicit_placement` pools attached, AFAICT is this an outdated mechanisms
> but there are no migration tools around. I’ve seen some people talk about
> patched versions of the `radosgw-admin metadata put` variant that (still)
> prohibits removing explicit placements.
> > >>
> > >> AFAICT those explicit placements will be synced to the secondary zone
> and the effect that I’m seeing underpins that theory: the sync runs for a
> while and only a few hundred objects show up in the new zone, as the
> buckets/objects are already found in the old pool that the new zone uses
> due to the explicit placement rule.
> > >>
> > >> I’m currently running out of ideas, but open for any other options.
> > >>
> > >> Looking at
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/ULKK5RU2VXLFXNUJMZBMUG7CQ5UCWJCB/#R6CPZ2TEWRFL2JJWP7TT5GX7DPSV5S7Z
> I’m wondering whether the relevant patch is available somewhere, or whether
> I’ll have to try building that patch again on my own.
> > >>
> > >> Going through the docs and the code I’m actually wondering whether
> `explicit_placement` is actually a really crufty residual piece that won’t
> get used in newer clusters but older clusters don’t really have an option
> to get away from?
> > >>
> > >> In my specific case, the placement rules are identical to the
> explicit placements that are stored on (apparently older) buckets and the
> only thing I need to do is to remove them. I can accept a bit of downtime
> to avoid any race conditions if needed, so maybe having a small tool to
> just remove those entries while all RGWs are down would be fine. A call to
> `radosgw-admin bucket stat` takes about 18s for all buckets in production
> and I guess that would be a good comparison for what timing to expect when
> running an update on the metadata.
> > >>
> > >> I’ll also be in touch with colleagues from Heinlein and 42on but I’m
> open to other suggestions.
> > >>
> > >> Hugs,
> > >> Christian
> > >>
> > >> [1] We currently have 215TiB data in 230M objects. Using the
> “official” “cache-flush-evict-all” approach was unfeasible here as it only
> yielded around 50MiB/s. Using cache limits and targetting the cache sizes
> to 0 caused proper parallelization and was able to flush/evict at almost
> constant 1GiB/s in the cluster.
> > >>
> > >>
> > >> --
> > >> Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
> > >> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> > >> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> > >> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian
> Zagrodnick
> > >> _______________________________________________
> > >> ceph-users mailing list -- ceph-users@xxxxxxx
> > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> > > Liebe Grüße,
> > > Christian Theune
> > >
> > > --
> > > Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
> > > Flying Circus Internet Operations GmbH · https://flyingcircus.io
> > > Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> > > HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian
> Zagrodnick
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> > Liebe Grüße,
> > Christian Theune
> >
> > --
> > Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
> > Flying Circus Internet Operations GmbH · https://flyingcircus.io
> > Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> > HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian
> Zagrodnick
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> Liebe Grüße,
> Christian Theune
>
> --
> Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian
> Zagrodnick
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux