On 18/12/2024 15:37, Robert Sander wrote:
Hi Florian, On 17.12.24 20:10, Florian Haas wrote:1. Disable orchestrator scheduling for the affected node: "ceph orch host label add <host> _no_schedule".14. Re-enable orchestrator scheduling with "ceph orch host label rm <host> _no_schedule".Wouldn't it be easier to run "ceph orch host maintenance enter HOST" before and "ceph orch host maintenance exit HOST" after the upgrade?
I'm not a big fan of maintenance mode, to be honest. To illustrate why, assume you've got 3 Mons in your cluster.Now, on one of your physical hosts that runs a Mon, you enter maintenance mode. This will just shut down the Mon. Now you proceed with the system upgrade, which will vary in length. During that time you're running on two Mons.
Now, something unexpected happens on another node that runs another Mon. Boom, your cluster is now offline, and you need to scramble to fix things.
If conversely you set _no_schedule and you still have other hosts to migrate your Mon to (per your placement policy), then you'll run on 3 Mons throughout.
And I prefer to have a policy that works on all nodes in a cluster. That's why I'd rather not have separate procedures for Mons and non-Mons. Hence the steps as described, which should work on all node types, Mon or not.
Also, while maintenance does stop and disable the systemd ceph.target, meaning the services won't come up even if the host is rebooted, "systemctl status ceph.target" will still return "active" and "enabled" which may break assumptions by monitoring systems, orchestration frameworks, etc.
That's why I currently prefer working with the _no_schedule label, over working with maintenance mode.
Am I making sense? Cheers, Florian
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx