True, good luck with that, its kind of a tedious process that takes just too long time :( Nino On Sat, Jun 17, 2023 at 7:48 AM Christian Theune <ct@xxxxxxxxxxxxxxx> wrote: > What got lost is that I need to change the pool’s m/k parameters, which is > only possible by creating a new pool and moving all data from the old pool. > Changing the crush rule doesn’t allow you to do that. > > > On 16. Jun 2023, at 23:32, Nino Kotur <ninokotur@xxxxxxxxx> wrote: > > > > If you create new crush rule for ssd/nvme/hdd and attach it to existing > pool you should be able to do the migration seamlessly while everything is > online... However impact to user will depend on storage devices load and > network utilization as it will create chaos on cluster network. > > > > Or did i get something wrong? > > > > > > > > > > Kind regards, > > Nino > > > > > > On Wed, Jun 14, 2023 at 5:44 PM Christian Theune <ct@xxxxxxxxxxxxxxx> > wrote: > > Hi, > > > > further note to self and for posterity … ;) > > > > This turned out to be a no-go as well, because you can’t silently switch > the pools to a different storage class: the objects will be found, but the > index still refers to the old storage class and lifecycle migrations won’t > work. > > > > I’ve brainstormed for further options and it appears that the last > resort is to use placement targets and copy the buckets explicitly - twice, > because on Nautilus I don’t have renames available, yet. :( > > > > This will require temporary downtimes prohibiting users to access their > bucket. Fortunately we only have a few very large buckets (200T+) that will > take a while to copy. We can pre-sync them of course, so the downtime will > only be during the second copy. > > > > Christian > > > > > On 13. Jun 2023, at 14:52, Christian Theune <ct@xxxxxxxxxxxxxxx> > wrote: > > > > > > Following up to myself and for posterity: > > > > > > I’m going to try to perform a switch here using (temporary) storage > classes and renaming of the pools to ensure that I can quickly change the > STANDARD class to a better EC pool and have new objects located there. > After that we’ll add (temporary) lifecycle rules to all buckets to ensure > their objects will be migrated to the STANDARD class. > > > > > > Once that is finished we should be able to delete the old pool and the > temporary storage class. > > > > > > First tests appear successfull, but I’m a bit struggling to get the > bucket rules working (apparently 0 days isn’t a real rule … and the debug > interval setting causes high frequent LC runs but doesn’t seem move objects > just yet. I’ll play around with that setting a bit more, though, I think I > might have tripped something that only wants to process objects every so > often and on an interval of 10 a day is still 2.4 hours … > > > > > > Cheers, > > > Christian > > > > > >> On 9. Jun 2023, at 11:16, Christian Theune <ct@xxxxxxxxxxxxxxx> > wrote: > > >> > > >> Hi, > > >> > > >> we are running a cluster that has been alive for a long time and we > tread carefully regarding updates. We are still a bit lagging and our > cluster (that started around Firefly) is currently at Nautilus. We’re > updating and we know we’re still behind, but we do keep running into > challenges along the way that typically are still unfixed on main and - as > I started with - have to tread carefully. > > >> > > >> Nevertheless, mistakes happen, and we found ourselves in this > situation: we converted our RGW data pool from replicated (n=3) to erasure > coded (k=10, m=3, with 17 hosts) but when doing the EC profile selection we > missed that our hosts are not evenly balanced (this is a growing cluster > and some machines have around 20TiB capacity for the RGW data pool, wheres > newer machines have around 160TiB and we rather should have gone with k=4, > m=3. In any case, having 13 chunks causes too many hosts to participate in > each object. Going for k+m=7 will allow distribution to be more effective > as we have 7 hosts that have the 160TiB sizing. > > >> > > >> Our original migration used the “cache tiering” approach, but that > only works once when moving from replicated to EC and can not be used for > further migrations. > > >> > > >> The amount of data is at 215TiB somewhat significant, so using an > approach that scales when copying data[1] to avoid ending up with months of > migration. > > >> > > >> I’ve run out of ideas doing this on a low-level (i.e. trying to fix > it on a rados/pool level) and I guess we can only fix this on an > application level using multi-zone replication. > > >> > > >> I have the setup nailed in general, but I’m running into issues with > buckets in our staging and production environment that have > `explicit_placement` pools attached, AFAICT is this an outdated mechanisms > but there are no migration tools around. I’ve seen some people talk about > patched versions of the `radosgw-admin metadata put` variant that (still) > prohibits removing explicit placements. > > >> > > >> AFAICT those explicit placements will be synced to the secondary zone > and the effect that I’m seeing underpins that theory: the sync runs for a > while and only a few hundred objects show up in the new zone, as the > buckets/objects are already found in the old pool that the new zone uses > due to the explicit placement rule. > > >> > > >> I’m currently running out of ideas, but open for any other options. > > >> > > >> Looking at > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/ULKK5RU2VXLFXNUJMZBMUG7CQ5UCWJCB/#R6CPZ2TEWRFL2JJWP7TT5GX7DPSV5S7Z > I’m wondering whether the relevant patch is available somewhere, or whether > I’ll have to try building that patch again on my own. > > >> > > >> Going through the docs and the code I’m actually wondering whether > `explicit_placement` is actually a really crufty residual piece that won’t > get used in newer clusters but older clusters don’t really have an option > to get away from? > > >> > > >> In my specific case, the placement rules are identical to the > explicit placements that are stored on (apparently older) buckets and the > only thing I need to do is to remove them. I can accept a bit of downtime > to avoid any race conditions if needed, so maybe having a small tool to > just remove those entries while all RGWs are down would be fine. A call to > `radosgw-admin bucket stat` takes about 18s for all buckets in production > and I guess that would be a good comparison for what timing to expect when > running an update on the metadata. > > >> > > >> I’ll also be in touch with colleagues from Heinlein and 42on but I’m > open to other suggestions. > > >> > > >> Hugs, > > >> Christian > > >> > > >> [1] We currently have 215TiB data in 230M objects. Using the > “official” “cache-flush-evict-all” approach was unfeasible here as it only > yielded around 50MiB/s. Using cache limits and targetting the cache sizes > to 0 caused proper parallelization and was able to flush/evict at almost > constant 1GiB/s in the cluster. > > >> > > >> > > >> -- > > >> Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 > > >> Flying Circus Internet Operations GmbH · https://flyingcircus.io > > >> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland > > >> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian > Zagrodnick > > >> _______________________________________________ > > >> ceph-users mailing list -- ceph-users@xxxxxxx > > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > Liebe Grüße, > > > Christian Theune > > > > > > -- > > > Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 > > > Flying Circus Internet Operations GmbH · https://flyingcircus.io > > > Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland > > > HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian > Zagrodnick > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > Liebe Grüße, > > Christian Theune > > > > -- > > Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 > > Flying Circus Internet Operations GmbH · https://flyingcircus.io > > Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland > > HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian > Zagrodnick > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > Liebe Grüße, > Christian Theune > > -- > Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0 > Flying Circus Internet Operations GmbH · https://flyingcircus.io > Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland > HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian > Zagrodnick > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx