hi all, my osd hit suicide timeout. some log: 2019-10-10 03:53:13.017760 7f1ab886e700 0 -- 192.168.1.5:6810/1028846 >> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=2 pgs=287284 cs=41 l=0 c=0x21431760).fault, initiating reconnect 2019-10-10 03:53:13.017799 7f1ab967c700 0 -- 192.168.1.5:6810/1028846 >> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=1 pgs=287284 cs=42 l=0 c=0x21431760).fault 2019-10-10 03:53:15.890773 7f1acdec3700 0 -- 192.168.1.5:6810/1028846 >> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=2 pgs=423672 cs=85 l=0 c=0x21447900).fault, initiating reconnect 2019-10-10 03:53:15.890890 7f1aba288700 0 -- 192.168.1.5:6810/1028846 >> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=1 pgs=423672 cs=86 l=0 c=0x21447900).fault 2019-10-10 03:53:16.209368 7f1addc3e700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f1ac29a3700' had timed out after 15 2019-10-10 03:53:16.209382 7f1addc3e700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f1ac29a3700' had suicide timed out after 150 2019-10-10 03:53:16.210765 7f1addc3e700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f1addc3e700 time 2019-10-10 03:53:16.209415 common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x12b) [0xaf2b6b] 2: (ceph::HeartbeatMap::is_healthy()+0xa7) [0xaf3497] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xaf3988] 4: (CephContextServiceThread::entry()+0x13f) [0xb0353f] 5: (()+0x79d1) [0x7f1ae0b3c9d1] 6: (clone()+0x6d) [0x7f1adfaccb5d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. can you give some advice on troubleshooting? _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx