Re: Maintenance mode?

Dhairya Parmar <dparmar@xxxxxxxxxx> · Mon, 30 May 2022 12:19:08 +0530

Hi Jeremy,

I think there is a maintenance mode for Ceph, maybe check this
<https://docs.ceph.com/en/latest/dev/cephadm/host-maintenance/> out or
maybe this
<https://docs.mirantis.com/mcp/q4-18/mcp-operations-guide/openstack-operations/ceph-operations/shut-down-ceph-cluster.html>
could
help too.

Thanks,
Dhairya

On Mon, May 30, 2022 at 9:41 AM Jeremy Hansen <jeremy@xxxxxxxxxx> wrote:

> So in my experience so far, if I take out a switch after a firmware update
> and a reboot of the switch, meaning all ceph nodes lose network
> connectivity and communication with each other, Ceph becomes unresponsive
> and my only fix up to this point has been to, one by one, reboot the
> compute nodes. Are you saying I just need to wait? I don’t know how long
> I’ve waited in the past, but if you’re saying at least 10 minutes, I
> probably haven’t waited that long.
>
> Thanks
> -jeremy
>
> > On Sunday, May 29, 2022 at 3:40 PM, Tyler Stachecki <
> stachecki.tyler@xxxxxxxxx (mailto:stachecki.tyler@xxxxxxxxx)> wrote:
> > Ceph always aims to provide high availability. So, if you do not set
> cluster flags that prevent Ceph from trying to self-heal, it will self-heal.
> >
> > Based on your description, it sounds like you want to consider the
> 'noout' flag. By default, after 10(?) minutes of an OSD being down, Ceph
> will begin the process of outing the affected OSD to ensure high
> availability.
> >
> > But be careful, as far as latency goes -- you likely still want to
> pre-emptively mark OSDs down ahead of the planned maintenance for latency
> purposes, and you must be cognisant of whether or not your replication
> policy puts you in a position where an unrelated failure during the
> maintenance can result in inactive PGs.
> >
> > Cheers,
> > Tyler
> >
> >
> > On Sun, May 29, 2022, 5:30 PM Jeremy Hansen <jeremy@xxxxxxxxxx (mailto:
> jeremy@xxxxxxxxxx)> wrote:
> > > Is there a maintenance mode for Ceph that would allow me to do work on
> underlying network equipment without causing Ceph to panic? In our test
> lab, we don’t have redundant networking and when doing switch upgrades and
> such, Ceph has a panic attack and we end up having to reboot Ceph nodes
> anyway. Like an hdfs style readonly mode or something?
> > >
> > > Thanks!
> > >
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx (mailto:
> ceph-users@xxxxxxx)
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx (mailto:
> ceph-users-leave@xxxxxxx)
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx