MONs not trimming

Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> · Tue, 17 Dec 2024 14:40:02 +0100

Hi all,

We moved our Ceph cluster to a new data centre about three months ago, 
which completely changed its physical topology. I changed the CRUSH map 
accordingly so that the CRUSH location matches the physical location 
again and the cluster has been rebalancing ever since. Due to capacity 
limits, the rebalancing requires rather frequent manual reweights so 
that individual OSDs don't run full. We started at around 65% of 
remapped PGs and are now down to 12% (it goes a little faster now that 
the bulk is done).

Unfortunately, the MONs refuse to trim their stores while there are 
remapped PGs in the cluster, so their disk usage has increased gradually 
from 600MB to 17GB. I assume the rebalancing will finish before we run 
out of disk space, but the MON restart times have become unsustainably 
long and compacts now take up to 20 minutes or so. I'm a bit worried 
that at some point the MONs will start becoming unstable or worse: the 
stores may get corrupted.

Is there anything that I can do to safely (!) get the MON stores back to 
a more manageable size even without all PGs being active+clean? I would 
like to forgo any potential disaster, especially before the holidays.

Thanks
Janek

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx