Questions about tweaking ceph rebalancing activities

ceph-users@xxxxxxxxxxxxxxxxx · Tue, 19 Oct 2021 12:32:13 +0000

Hello all,

I am in the progress of adding and removing a number of OSDs in my cluster and I'm running in to some issues where it would be good to be able to control the system a bit better. I've tried the documentation and google-fu but have come up short.

This is the background/scenario: I have a cluster that is/was working fine, had HEALTH_OK. I've added a number of new OSDs to the cluster, starting a lot of rebalancing. I also want to remove a number of OSDs from the cluster. Some of these OSDs have been marked out. The cluster has been rebalancing for more than two weeks and in state HEALTH_WARN.

Inter-related issue 1
While the cluster is rebalancing, I would like to prioritize migrating PGs from the OSDs that have been marked out. Even though they are marked as out, I can't stop them (down) and remove them (destroy/purge), since they still have remaining PGs. For instance, I've had about eight OSDs with between 3 and 7 PGs remaining (ceph osd safe-to-destroy <osd-id>) for over a week. As long as these handful of PGs are there, I can't remove those OSDs. I have set osd_max_backfulls, osd_recovery_max_active, osd_recovery_single_start and osd_recovery_sleep on the particular OSDs with no apparent affect, i.e. the PGs are still remaining.

Is there a way to prioritize particular OSDs/PGs for rebalancing?

Inter-related issue 2
An alternative would be to just destroy the almost empty OSDs anyway, creating recovery activity instead of rebalancing. It doesn't seem like the recovery activity is prioritized over the rebalancing activity.

Is there a way to ensure recovery activities are prioritized over rebalancing activities.

Inter-related issue 3
I spun up another OSD, marked it as up and out. This caused many additional PGs to become misplaced. Stopping and destroying the new, empty OSD again changed the number of misplaced PGs (returning to the previous amount/percentage).

Can I prevent this by reweighting the OSDs to 0 in addition to marking them as out, or are there any other ways of preventing an OSD marked out to impact the balancing?

Inter-related issue 4
During rebalancing, several smaller OSDs have become near full. Then one became full (>95%). This changed the cluster from HEALTH_WARN to HEALTH_ERR, stopping client activities. Reweighting the OSD and the near full OSDs did not change the cluster status. In essence, as far as I have understood it, all the data is there and available, the cluster is in the process of a massive rebalancing, PGs on the full OSD were misplaced and supposed to be moved elsewhere (in any case after the manual reweighting), so there should be no reason for the cluster to go to ERR. Also as a consequence of the cluster rebalancing for a long time, the balancer module is prevented from reweighting OSDs which could have prevented the ERR state (if the reweighting had had an impact). My solution, which had to be performed by manual intervention, was to mark the full OSD as out. The cluster changed back to HEALTH_WARN, client operations resumed and the rebalancing could continue in the background.

Is there another way to handle a situation like this (an OSD becomes full, while having misplaced PGs on it, blocking the cluster)?

Apologies for so many questions in the same email! They are all part of the same management activity for me.

Many thanks!

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx