Re: MONs not trimming

Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> · Tue, 17 Dec 2024 09:03:35 -0500

Agree with pg-remapper or upmap-remapped approach. One thing to be aware of
though is that the Mons will invalidate any upmap which breaks the data
placement rules. So for instance if you are moving from host based failure
domain to rack based failure domain attempting to upmap the data back to
its current location (to shed the remapped state) will mostly be useless as
those upmap rules would break the data placement rule (not rack based) and
therefore the mon will reject them.

I think the approach I would use if you think the above will impact you is
to increase the backfill rate to be as fast as possible. The approach
depends on if you are using mclock or wpq. Which are you using? Also which
version of Ceph are you using?

Respectfully,

*Wes Dillingham*
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
wes@xxxxxxxxxxxxxxxxx

On Tue, Dec 17, 2024 at 8:52 AM Burkhard Linke <
Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

> Hi,
>
> On 17.12.24 14:40, Janek Bevendorff wrote:
> > Hi all,
> >
> > We moved our Ceph cluster to a new data centre about three months ago,
> > which completely changed its physical topology. I changed the CRUSH
> > map accordingly so that the CRUSH location matches the physical
> > location again and the cluster has been rebalancing ever since. Due to
> > capacity limits, the rebalancing requires rather frequent manual
> > reweights so that individual OSDs don't run full. We started at around
> > 65% of remapped PGs and are now down to 12% (it goes a little faster
> > now that the bulk is done).
> >
> > Unfortunately, the MONs refuse to trim their stores while there are
> > remapped PGs in the cluster, so their disk usage has increased
> > gradually from 600MB to 17GB. I assume the rebalancing will finish
> > before we run out of disk space, but the MON restart times have become
> > unsustainably long and compacts now take up to 20 minutes or so. I'm a
> > bit worried that at some point the MONs will start becoming unstable
> > or worse: the stores may get corrupted.
> >
> > Is there anything that I can do to safely (!) get the MON stores back
> > to a more manageable size even without all PGs being active+clean? I
> > would like to forgo any potential disaster, especially before the
> > holidays.
>
>
> Just my 0.02 euro...
>
>
> You can use pg-remapper (https://github.com/digitalocean/pgremapper) or
> similar tools to cancel the remapping; up-map entries will be created
> that reflect the current state of the cluster. After all currently
> running backfills are finished your mons should not be blocked anymore.
> I would also disable the balancer temporarily since it will trigger new
> backfills for those PG that are not at their optimal locations. After
> mons are fine again you can just enable the balancer. This requires a
> ceph release and ceph clients with up-map support.
>
> Not tested in real life, but this approach might work.
>
>
> Best regards,
>
> Burkhard
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx