Hi all, Our production ceph node each has two NIC, one used by heartbeat another used by cluster_network. By accident, the heartbeat NIC is broken but the cluster_network NIC is healthy. But osds report the broken NIC node is unavailable, so monitor decide to kick out the node. I'm not sure what I describe match the code logic, if so, is it more reasonable that ceph-osd process can detect cluster_network is healthy so we don't kick out the broken node. -- Best Regards, Wheat