Dear ceph users,
On one of our clusters I have some difficulties with the upmap
balancer. We started with a reasonably well balanced cluster (using the
balancer in upmap mode). After a node failure, we crush reweighted all
the OSDs of the node to take it out of the cluster - and waited for the
cluster to rebalance. Obviously, this significantly changes the crush
map - hence the nice balance created by the balancer was gone. The
recovery mostly completed - but some of the OSDs became too full - so we
neded up with a few PGs that were backfill_toofull. The cluster has
plenty of space (overall perhaps 65% full), only a few OSDs are >90% (we
have backfillfull_ratio at 92%). The balancer refuses to change
anything since the cluster is not clean. Yet - the cluster can't become
clean without a few upmaps to help the top 3 or 4 most full OSDs.
I would think this is a fairly common situation - trying to recover
after some failure. Are there any recommendations on how to proceed?
Obviously I can manually find and insert upmaps - but for a large
cluster with tens of thousands of PGs, that isn't too practical. Is
there a way to tell the balancer to still do something even though some
PGs are undersized (with a quick look at the python module - I didn't
see any)?
The cluster is on Nautilus 14.2.15.
Thanks,
Andras
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx