Re: Fwd: HeartbeatMap FAILED assert(0 == "hit suicide timeout")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you got a coredump file, then you should check why the thread takes
so long to have a job done.

潘东元 <dongyuanpan0@xxxxxxxxx> 于2019年10月10日周四 上午10:51写道:
>
> hi all,
>     my osd hit suicide timeout.
>     some log:
> 2019-10-10 03:53:13.017760 7f1ab886e700  0 -- 192.168.1.5:6810/1028846
> >> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=2
> pgs=287284 cs=41 l=0 c=0x21431760).fault, initiating reconnect
> 2019-10-10 03:53:13.017799 7f1ab967c700  0 -- 192.168.1.5:6810/1028846
> >> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=1
> pgs=287284 cs=42 l=0 c=0x21431760).fault
> 2019-10-10 03:53:15.890773 7f1acdec3700  0 -- 192.168.1.5:6810/1028846
> >> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=2
> pgs=423672 cs=85 l=0 c=0x21447900).fault, initiating reconnect
> 2019-10-10 03:53:15.890890 7f1aba288700  0 -- 192.168.1.5:6810/1028846
> >> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=1
> pgs=423672 cs=86 l=0 c=0x21447900).fault
> 2019-10-10 03:53:16.209368 7f1addc3e700  1 heartbeat_map is_healthy
> 'OSD::op_tp thread 0x7f1ac29a3700' had timed out after 15
> 2019-10-10 03:53:16.209382 7f1addc3e700  1 heartbeat_map is_healthy
> 'OSD::op_tp thread 0x7f1ac29a3700' had suicide timed out after 150
> 2019-10-10 03:53:16.210765 7f1addc3e700 -1 common/HeartbeatMap.cc: In
> function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
> const char*, time_t)' thread 7f1addc3e700 time 2019-10-10
> 03:53:16.209415
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>
>  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char
> const*, long)+0x12b) [0xaf2b6b]
>  2: (ceph::HeartbeatMap::is_healthy()+0xa7) [0xaf3497]
>  3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xaf3988]
>  4: (CephContextServiceThread::entry()+0x13f) [0xb0353f]
>  5: (()+0x79d1) [0x7f1ae0b3c9d1]
>  6: (clone()+0x6d) [0x7f1adfaccb5d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> can you give some advice on troubleshooting?
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux