Re: switch restart facilitating cluster/client network.

Marc <Marc@xxxxxxxxxxxxxxxxx> · Wed, 26 Jan 2022 10:43:28 +0000

Thanks for the tips!!!

> 
> I would still set noout on relevant parts of the cluster in case something
> goes south and it does take longer than 2 minutes. Otherwise OSDs will
> start outing themselves after 10 minutes or so by default and then you
> have a lot of churn going on.
> 
> The monitors monitors will be fine unless you lose quorum, but even so
> they'll just recover once the switch comes back. You just won't be able to
> make changes to the cluster if you lose mon quorum, nor will the OSDs
> start recovering etc. until that occurs.
> 
> Depending on which version of Ceph/libvirt/etc. you are running, I have
> seen issues with older releases of the same where a handful of VMs get
> indefinitely stuck with really high I/Owait afterwards and needed to be
> manually rebooted on occasion when doing something like this.
> 
> As another user mentioned, the kernels softlockup handler kicks in after
> 120 seconds by default so you'll see lots of stacktraces in the VMs due to
> processes blocked on I/O if the reboot and repeering doesn't all happen
> within exactly two minutes.
> 
> If you can afford to shutdown all the VMs in the cluster, it might be for
> the best as they'll be losing I/O...
> 
> 
> On Tue, Jan 25, 2022, 4:27 AM Marc <Marc@xxxxxxxxxxxxxxxxx
> <mailto:Marc@xxxxxxxxxxxxxxxxx> > wrote:
> 
> 
> 
> 	If the switch needs an update and needs to be restarted (expected 2
> minutes). Can I just leave the cluster as it is, because ceph will handle
> this correctly? Or should I eg. put some vm's I am running in pause mode,
> or even stop them. What happens to the monitors? Can they handle this, or
> maybe better to switch from 3 to 1 one?
> 
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx