osd node heartbeat NIC broken and kick out

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The heartbeat code is very careful to use the same physical interfaces as
1) the cluster network
2) the public network

If the first breaks, the OSD can't talk with its peers. If the second
breaks, it can't talk with the monitors or clients. Either way, the
OSD can't do its job so it gets marked down.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sat, Jul 19, 2014 at 3:08 AM, Haomai Wang <haomaiwang at gmail.com> wrote:
> Hi all,
>
> Our production ceph node each has two NIC, one used by heartbeat
> another used by cluster_network.
>
> By accident, the heartbeat NIC is broken but the cluster_network NIC
> is healthy. But osds report the broken NIC node is unavailable, so
> monitor decide to kick out the node.
>
> I'm not sure what I describe match the code logic, if so, is it more
> reasonable that ceph-osd process can detect cluster_network is healthy
> so we don't kick out the broken node.
>
> --
> Best Regards,
>
> Wheat
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux