Re: HeartbeatMap FAILED assert(0 == "hit suicide timeout")

潘东元 <dongyuanpan0@xxxxxxxxx> · Fri, 11 Oct 2019 14:12:49 +0800

I‘m pretty sure, this issue here is that there is a communication
issue between the osds.
logged over and over again report initiating reconnect.

I looked at my network,and have dropped packets,this is probably the
tcp queue full at osd daemon listen port.
My cluster had 21 nodes, with 5 osds on each node.

[root@node-4 ~]# netstat -nat | grep -w "6801" | grep -w tcp | grep
ESTABLISHED | wc -l
271

[root@node-4 ~]# sysctl -a | grep somax
net.core.somaxconn = 128

this will cause the tcp connection to reset.

so,i will increase this parameter!

-------------
Regards!

Janne Johansson <icepic.dz@xxxxxxxxx> 于2019年10月10日周四 下午9:16写道：
>
>
>
> Den tors 10 okt. 2019 kl 15:12 skrev 潘东元 <dongyuanpan0@xxxxxxxxx>:
>>
>> hi all,
>>     my osd hit suicide timeout.
>>
>> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>>
>>  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>
>
>>
>> can you give some advice on troubleshooting?
>
>
> It is a very old release, chances are large whatever issue you get here might have been fixed in the last 5 years.
>
> --
> May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx