Re: Stop Rebalancing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It sounds like this is from a PG merge, so I'm going to _guess_ that you don't want to straight up cancel the current backfill and instead pause it to catch your breath.

You can set `nobackfill` and/or `norebalance` which should pause the backfill. Alternatively, use `ceph config set osd.* osd_max_backfills 0` to stop all OSDs from allowing backfill to continue. You could use this to throttle it on an OSD cadence, though that's a bit messy. Consider the recovery sleep options for that, too.

However, if you want to fully cancel the rebalance, you might want to set the PG count back to where you were (if that's what you want), and unless you had a bunch of upmaps already, your cluster should be mostly balanced, minus the data that has already PG-merged.

I don't think you can do something like use `pgremapper cancel-backfill --yes` (see Github) for this because of the PG merge (though maybe you can, I haven't tried it), which will add upmaps for ongoing remapped PGs to stop them from happening.

Others can chime in with other options, I'm always interested in new ways to reign in lots of backfill.


On 2022-04-12 16:03, Ray Cunningham wrote:
Hi Everyone,

We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting
rebalancing of misplaced objects is overwhelming the cluster and
impacting MON DB compaction, deep scrub repairs and us upgrading
legacy bluestore OSDs. We have to pause the rebalancing if misplaced
objects or we're going to fall over.

Autoscaler-status tells us that we are reducing our PGs by 700'ish
which will take us over 100 days to complete at our current recovery
speed. We disabled autoscaler on our biggest pool, but I'm concerned
that it's already on the path to the lower PG count and won't stop
adding to our misplaced count after drop below 5%. What can we do to
stop the cluster from finding more misplaced objects to rebalance?
Should we set the PG num manually to what our current count is? Or
will that cause even more havoc?

Any other thoughts or ideas? My goals are to stop the rebalancing
temporarily so we can deep scrub and repair inconsistencies, upgrade
legacy bluestore OSDs and compact our MON DBs (supposedly MON DBs
don't compact when you aren't 100% active+clean).

Thank you,
Ray

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux