Thanks for the replies, It feels to me that cephadm should handle this case as it offers the maintenance function. right now i have a simple version of a playbook that just does the noout / patch the OS and reboot and unset noout ( similar to https://github.com/ceph/ceph-ansible/blob/main/infrastructure-playbooks/untested-by-ci/cluster-maintenance.yml ) and a different version that attempts the host maintenance but fails on the instance that is running the mgr. If i get anywhere with detecting the instance is the active manager handling that in Ansible i will reply back here. Cheers Steven Goodliff ________________________________ From: Robert Gallop <robert.gallop@xxxxxxxxx> Sent: 13 July 2022 16:55 To: Adam King Cc: Steven Goodliff; ceph-users@xxxxxxx Subject: Re: Re: cephadm host maintenance This brings up a good follow on…. Rebooting in general for OS patching. I have not been leveraging the maintenance mode function, as I found it was really no different than just setting noout and doing the reboot. I find if the box is the active manager the failover happens quick, painless and automatically. All the OSD’s just show as missing and come back once the box is back from reboot… Am I causing issues I may not be aware of? How is everyone handling patching reboots? The only place I’m careful is the active MDS nodes, since that failover does cause a period of no i/o for the mounted clients, I generally fail that manually so I can ensure I don’t have to wait for the MDS to figure out an instance is gone and spin up a standby…. Any tips or techniques until there is a more holistic approach? Thanks! On Wed, Jul 13, 2022 at 9:49 AM Adam King <adking@xxxxxxxxxx<mailto:adking@xxxxxxxxxx>> wrote: Hello Steven, Arguably, it should, but right now nothing is implemented to do so and you'd have to manually run the "ceph mgr fail node2-cobj2-atdev1-nvan.ghxlvw" before it would allow you to put the host in maintenance. It's non-trivial from a technical point of view to have it automatically do the switch as the cephadm instance is running on that active mgr, so it will have to store somewhere that we wanted this host in maintenance, fail over the mgr itself, then have the new cephadm instance pick up that we wanted the host in maintenance and do so. Possible, but not something anyone has had a chance to implement. FWIW, I do believe there are also plans to eventually have a playbook for a rolling reboot or something of the sort added to https://github.com/ceph/cephadm-ansible. But for now, I think some sort of intervention to cause the fail over to happen before running the maintenance enter command is necessary. Regards, - Adam King On Wed, Jul 13, 2022 at 11:02 AM Steven Goodliff < Steven.Goodliff@xxxxxxxxxxxxxxx<mailto:Steven.Goodliff@xxxxxxxxxxxxxxx>> wrote: > > Hi, > > > I'm trying to reboot a ceph cluster one instance at a time by running in a > Ansible playbook which basically runs > > > cephadm shell ceph orch host maintenance enter <hostname> and then > reboots the instance and exits the maintenance > > > but i get > > > ALERT: Cannot stop active Mgr daemon, Please switch active Mgrs with 'ceph > mgr fail node2-cobj2-atdev1-nvan.ghxlvw' > > > on one instance. should cephadm handle the switch ? > > > thanks > > Steven Goodliff > Global Relay > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx