Hi all, I hope this message finds you well. We recently encountered an issue on one of our OSD servers, leading to network flapping and subsequently causing significant performance degradation across our entire cluster. Although the OSDs were correctly marked as down in the monitor, slow ops persisted until we resolved the network issue. This incident resulted in a major disruption, especially affecting VMs with mapped RBD images, leading to their freezing. In light of this, I have two key questions for the community: 1. Why did slow ops persist even after marking the affected server as down in the monitor? 2.Are there any recommended configurations for OSD suicide or OSD down reports that could help us better handle similar network-related issues in the future? Best Regards, Mahnoosh _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx