Hi all,We moved our Ceph cluster to a new data centre about three months ago, which completely changed its physical topology. I changed the CRUSH map accordingly so that the CRUSH location matches the physical location again and the cluster has been rebalancing ever since. Due to capacity limits, the rebalancing requires rather frequent manual reweights so that individual OSDs don't run full. We started at around 65% of remapped PGs and are now down to 12% (it goes a little faster now that the bulk is done).
Unfortunately, the MONs refuse to trim their stores while there are remapped PGs in the cluster, so their disk usage has increased gradually from 600MB to 17GB. I assume the rebalancing will finish before we run out of disk space, but the MON restart times have become unsustainably long and compacts now take up to 20 minutes or so. I'm a bit worried that at some point the MONs will start becoming unstable or worse: the stores may get corrupted.
Is there anything that I can do to safely (!) get the MON stores back to a more manageable size even without all PGs being active+clean? I would like to forgo any potential disaster, especially before the holidays.
Thanks Janek
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx