osd node heartbeat NIC broken and kick out

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 19, 2014 at 11:08 AM, Wang Haomai <haomaiwang at gmail.com> wrote:
> Oh, it's our fault.
>
> Public_addr and cluster_addr use the same NIC(eth1). But we found during recovering heartbeat may timeout because of busy traffic. I *misunderstood* the mean of heartbeat and use another NIC(eth0) address for heartbeat to avoid timeout.

Hmm, the only times we've seen heartbeats timeout from sharing the NIC
is if there are other issues on the server (e.g., NIC interrupt
handling is going to a single core and saturating it); if you've seen
this under normal recovery conditions we'd like to gather more
information and figure out what happened!
-Greg

>
> From your points, it's easy to understand. And I see the code comments(src/ceph-osd.cc) claim the usage.
>
> Best Wishes!
>
>> ? 2014?7?20??1:14?Gregory Farnum <greg at inktank.com> ???
>>
>> The heartbeat code is very careful to use the same physical interfaces as
>> 1) the cluster network
>> 2) the public network
>>
>> If the first breaks, the OSD can't talk with its peers. If the second
>> breaks, it can't talk with the monitors or clients. Either way, the
>> OSD can't do its job so it gets marked down.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>>> On Sat, Jul 19, 2014 at 3:08 AM, Haomai Wang <haomaiwang at gmail.com> wrote:
>>> Hi all,
>>>
>>> Our production ceph node each has two NIC, one used by heartbeat
>>> another used by cluster_network.
>>>
>>> By accident, the heartbeat NIC is broken but the cluster_network NIC
>>> is healthy. But osds report the broken NIC node is unavailable, so
>>> monitor decide to kick out the node.
>>>
>>> I'm not sure what I describe match the code logic, if so, is it more
>>> reasonable that ceph-osd process can detect cluster_network is healthy
>>> so we don't kick out the broken node.
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux