Re: Network Flapping Causing Slow Ops and Freezing VMs

Eugen Block <eblock@xxxxxx> · Mon, 08 Jan 2024 08:16:38 +0000

Hi,

just to get a better understanding, when you write

Although the OSDs were correctly marked as down in the monitor, slow  
ops persisted until we resolved the network issue.

do you mean that the MONs marked the OSDs as down (temporarily) or did  
you do that? Because if the OSDs "flap" they would also mark  
themselves "up" all the time, this should be reflected in the OSD  
logs, something like "wrongly marked me down". Can you confirm that  
the daemons were still up and logged the "wrongly marked me down"  
messages?
In some cases the "nodown" flag can prevent flapping OSDs, but since  
you actually had a network issue it wouldn't really help here. I would  
probably have set the noout flag and stop the OSD daemons on the  
affected node until the issue was resolved.

Regards,
Eugen

Zitat von mahnoosh shahidi <mahnooosh.shd@xxxxxxxxx>:

Hi all,

I hope this message finds you well. We recently encountered an issue on one
of our OSD servers, leading to network flapping and subsequently causing
significant performance degradation across our entire cluster. Although the
OSDs were correctly marked as down in the monitor, slow ops persisted until
we resolved the network issue. This incident resulted in a major
disruption, especially affecting VMs with mapped RBD images, leading to
their freezing.

In light of this, I have two key questions for the community:

1. Why did slow ops persist even after marking the affected server as down
in the monitor?

2.Are there any recommended configurations for OSD suicide or OSD down
reports that could help us better handle similar network-related issues in
the future?

Best Regards,
Mahnoosh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx