Hi Alexander, No we are not using separate networks and they are on the same physical interfaces. On Sat, Jan 6, 2024 at 7:27 PM Alexander E. Patrakov <patrakov@xxxxxxxxx> wrote: > Hello Mahnoosh, > > Just to double check, can you confirm that you are NOT using a > physically separate cluster network and private network? A > configuration with such physically separate networks is inherently > vulnerable and therefore cannot be recommended. VLANs on the same > physical interface are probably acceptable, but I have never seen a > cluster configured like this. > > > https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd/#flapping-osds > > On Sat, Jan 6, 2024 at 9:28 PM mahnoosh shahidi <mahnooosh.shd@xxxxxxxxx> > wrote: > > > > Hi all, > > > > I hope this message finds you well. We recently encountered an issue on > one > > of our OSD servers, leading to network flapping and subsequently causing > > significant performance degradation across our entire cluster. Although > the > > OSDs were correctly marked as down in the monitor, slow ops persisted > until > > we resolved the network issue. This incident resulted in a major > > disruption, especially affecting VMs with mapped RBD images, leading to > > their freezing. > > > > In light of this, I have two key questions for the community: > > > > 1. Why did slow ops persist even after marking the affected server as > down > > in the monitor? > > > > 2.Are there any recommended configurations for OSD suicide or OSD down > > reports that could help us better handle similar network-related issues > in > > the future? > > > > Best Regards, > > Mahnoosh > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > -- > Alexander E. Patrakov > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx