Hi Florian,
Am 12/18/24 um 16:18 schrieb Florian Haas:
To illustrate why, assume you've got 3 Mons in your cluster.
Now, on one of your physical hosts that runs a Mon, you enter
maintenance mode. This will just shut down the Mon. Now you proceed with
the system upgrade, which will vary in length. During that time you're
running on two Mons.
Now, something unexpected happens on another node that runs another Mon.
Boom, your cluster is now offline, and you need to scramble to fix things.
Yes, this is a risk. But shouldn't you run on 5 MONs today? At least
this seems to be the recommended number from the detail service spec.
If conversely you set _no_schedule and you still have other hosts to
migrate your Mon to (per your placement policy), then you'll run on 3
Mons throughout.
Then maybe the maintenance mode should also set this label.
Also, while maintenance does stop and disable the systemd ceph.target,
meaning the services won't come up even if the host is rebooted,
"systemctl status ceph.target" will still return "active" and "enabled"
which may break assumptions by monitoring systems, orchestration
frameworks, etc.
Your step 8 also just stops the ceph.target. Where is the difference?
Kindest Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin
https://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx