Agree with pg-remapper or upmap-remapped approach. One thing to be aware of though is that the Mons will invalidate any upmap which breaks the data placement rules. So for instance if you are moving from host based failure domain to rack based failure domain attempting to upmap the data back to its current location (to shed the remapped state) will mostly be useless as those upmap rules would break the data placement rule (not rack based) and therefore the mon will reject them. I think the approach I would use if you think the above will impact you is to increase the backfill rate to be as fast as possible. The approach depends on if you are using mclock or wpq. Which are you using? Also which version of Ceph are you using? Respectfully, *Wes Dillingham* LinkedIn <http://www.linkedin.com/in/wesleydillingham> wes@xxxxxxxxxxxxxxxxx On Tue, Dec 17, 2024 at 8:52 AM Burkhard Linke < Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi, > > On 17.12.24 14:40, Janek Bevendorff wrote: > > Hi all, > > > > We moved our Ceph cluster to a new data centre about three months ago, > > which completely changed its physical topology. I changed the CRUSH > > map accordingly so that the CRUSH location matches the physical > > location again and the cluster has been rebalancing ever since. Due to > > capacity limits, the rebalancing requires rather frequent manual > > reweights so that individual OSDs don't run full. We started at around > > 65% of remapped PGs and are now down to 12% (it goes a little faster > > now that the bulk is done). > > > > Unfortunately, the MONs refuse to trim their stores while there are > > remapped PGs in the cluster, so their disk usage has increased > > gradually from 600MB to 17GB. I assume the rebalancing will finish > > before we run out of disk space, but the MON restart times have become > > unsustainably long and compacts now take up to 20 minutes or so. I'm a > > bit worried that at some point the MONs will start becoming unstable > > or worse: the stores may get corrupted. > > > > Is there anything that I can do to safely (!) get the MON stores back > > to a more manageable size even without all PGs being active+clean? I > > would like to forgo any potential disaster, especially before the > > holidays. > > > Just my 0.02 euro... > > > You can use pg-remapper (https://github.com/digitalocean/pgremapper) or > similar tools to cancel the remapping; up-map entries will be created > that reflect the current state of the cluster. After all currently > running backfills are finished your mons should not be blocked anymore. > I would also disable the balancer temporarily since it will trigger new > backfills for those PG that are not at their optimal locations. After > mons are fine again you can just enable the balancer. This requires a > ceph release and ceph clients with up-map support. > > Not tested in real life, but this approach might work. > > > Best regards, > > Burkhard > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx