Upmap balancer after node failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear ceph users,

On one of our clusters I have some difficulties with the upmap balancer.  We started with a reasonably well balanced cluster (using the balancer in upmap mode).  After a node failure, we crush reweighted all the OSDs of the node to take it out of the cluster - and waited for the cluster to rebalance.  Obviously, this significantly changes the crush map - hence the nice balance created by the balancer was gone.  The recovery mostly completed - but some of the OSDs became too full - so we neded up with a few PGs that were backfill_toofull.  The cluster has plenty of space (overall perhaps 65% full), only a few OSDs are >90% (we have backfillfull_ratio at 92%).  The balancer refuses to change anything since the cluster is not clean.  Yet - the cluster can't become clean without a few upmaps to help the top 3 or 4 most full OSDs.

I would think this is a fairly common situation - trying to recover after some failure.  Are there any recommendations on how to proceed?  Obviously I can manually find and insert upmaps - but for a large cluster with tens of thousands of PGs, that isn't too practical.  Is there a way to tell the balancer to still do something even though some PGs are undersized (with a quick look at the python module - I didn't see any)?

The cluster is on Nautilus 14.2.15.

Thanks,

Andras
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux