Re: cephadm host maintenance

Robert Gallop <robert.gallop@xxxxxxxxx> · Wed, 13 Jul 2022 09:55:40 -0600

This brings up a good follow on…. Rebooting in general for OS patching.

I have not been leveraging the maintenance mode function, as I found it was
really no different than just setting noout and doing the reboot.  I find
if the box is the active manager the failover happens quick, painless and
automatically.  All the OSD’s just show as missing and come back once the
box is back from reboot…

Am I causing issues I may not be aware of?  How is everyone handling
patching reboots?

The only place I’m careful is the active MDS nodes, since that failover
does cause a period of no i/o for the mounted clients, I generally fail
that manually so I can ensure I don’t have to wait for the MDS to figure
out an instance is gone and spin up a standby….

Any tips or techniques until there is a more holistic approach?

Thanks!

On Wed, Jul 13, 2022 at 9:49 AM Adam King <adking@xxxxxxxxxx> wrote:

> Hello Steven,
>
> Arguably, it should, but right now nothing is implemented to do so and
> you'd have to manually run the "ceph mgr fail
> node2-cobj2-atdev1-nvan.ghxlvw" before it would allow you to put the host
> in maintenance. It's non-trivial from a technical point of view to have it
> automatically do the switch as the cephadm instance is running on that
> active mgr, so it will have to store somewhere that we wanted this host in
> maintenance, fail over the mgr itself, then have the new cephadm instance
> pick up that we wanted the host in maintenance and do so. Possible, but not
> something anyone has had a chance to implement. FWIW, I do believe there
> are also plans to eventually have a playbook for a rolling reboot or
> something of the sort added to https://github.com/ceph/cephadm-ansible.
> But
> for now, I think some sort of intervention to cause the fail over to happen
> before running the maintenance enter command is necessary.
>
> Regards,
>  - Adam King
>
> On Wed, Jul 13, 2022 at 11:02 AM Steven Goodliff <
> Steven.Goodliff@xxxxxxxxxxxxxxx> wrote:
>
> >
> > Hi,
> >
> >
> > I'm trying to reboot a ceph cluster one instance at a time by running in
> a
> > Ansible playbook which basically runs
> >
> >
> > cephadm shell ceph orch host maintenance enter <hostname>  and then
> > reboots the instance and exits the maintenance
> >
> >
> > but i get
> >
> >
> > ALERT: Cannot stop active Mgr daemon, Please switch active Mgrs with
> 'ceph
> > mgr fail node2-cobj2-atdev1-nvan.ghxlvw'
> >
> >
> > on one instance.  should cephadm handle the switch ?
> >
> >
> > thanks
> >
> > Steven Goodliff
> > Global Relay
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx