Thanks for your replies!
You can use pg-remapper (https://github.com/digitalocean/pgremapper) or similar tools to cancel the remapping; up-map entries will be created that reflect the current state of the cluster. After all currently running backfills are finished your mons should not be blocked anymore. I would also disable the balancer temporarily since it will trigger new backfills for those PG that are not at their optimal locations. After mons are fine again you can just enable the balancer. This requires a ceph release and ceph clients with up-map support.
Thanks, I'll try that. The balancer is already disabled, since I figured it only made the problem of individual OSDs filling up beyond their capacity worse while rebalancing was in progress.
I did change the failure domain for one pool, but apart from that, failure domain remains unchanged. However, the host position within racks has changed. The balancer is configured to use upmaps, but as said before, I disabled it after we moved, so upmaps would be out of date anyway.So for instance if you are moving from host based failure domain to rack based failure domain attempting to upmap the data back to its current location (to shed the remapped state) will mostly be useless as those upmap rules would break the data placement rule (not rack based) and therefore the mon will reject them.
I think the approach I would use if you think the above will impact you is to increase the backfill rate to be as fast as possible. The approach depends on if you are using mclock or wpq. Which are you using? Also which version of Ceph are you using?
I'm using mlock on Ceph 18.2.4.Anything else I should keep in mind? Will the manual OSD override weights be a problem when I apply pg-remapper?
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx