I‘m pretty sure, this issue here is that there is a communication issue between the osds. logged over and over again report initiating reconnect. I looked at my network,and have dropped packets,this is probably the tcp queue full at osd daemon listen port. My cluster had 21 nodes, with 5 osds on each node. [root@node-4 ~]# netstat -nat | grep -w "6801" | grep -w tcp | grep ESTABLISHED | wc -l 271 [root@node-4 ~]# sysctl -a | grep somax net.core.somaxconn = 128 this will cause the tcp connection to reset. so,i will increase this parameter! ------------- Regards! Janne Johansson <icepic.dz@xxxxxxxxx> 于2019年10月10日周四 下午9:16写道: > > > > Den tors 10 okt. 2019 kl 15:12 skrev 潘东元 <dongyuanpan0@xxxxxxxxx>: >> >> hi all, >> my osd hit suicide timeout. >> >> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") >> >> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) > > >> >> can you give some advice on troubleshooting? > > > It is a very old release, chances are large whatever issue you get here might have been fixed in the last 5 years. > > -- > May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx