Thanks for the tips!!! > > I would still set noout on relevant parts of the cluster in case something > goes south and it does take longer than 2 minutes. Otherwise OSDs will > start outing themselves after 10 minutes or so by default and then you > have a lot of churn going on. > > The monitors monitors will be fine unless you lose quorum, but even so > they'll just recover once the switch comes back. You just won't be able to > make changes to the cluster if you lose mon quorum, nor will the OSDs > start recovering etc. until that occurs. > > Depending on which version of Ceph/libvirt/etc. you are running, I have > seen issues with older releases of the same where a handful of VMs get > indefinitely stuck with really high I/Owait afterwards and needed to be > manually rebooted on occasion when doing something like this. > > As another user mentioned, the kernels softlockup handler kicks in after > 120 seconds by default so you'll see lots of stacktraces in the VMs due to > processes blocked on I/O if the reboot and repeering doesn't all happen > within exactly two minutes. > > If you can afford to shutdown all the VMs in the cluster, it might be for > the best as they'll be losing I/O... > > > On Tue, Jan 25, 2022, 4:27 AM Marc <Marc@xxxxxxxxxxxxxxxxx > <mailto:Marc@xxxxxxxxxxxxxxxxx> > wrote: > > > > If the switch needs an update and needs to be restarted (expected 2 > minutes). Can I just leave the cluster as it is, because ceph will handle > this correctly? Or should I eg. put some vm's I am running in pause mode, > or even stop them. What happens to the monitors? Can they handle this, or > maybe better to switch from 3 to 1 one? > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx