Is this a deadlock?

许雪寒 <xuxuehan@xxxxxx> · Wed, 4 Jan 2017 07:49:03 +0000

Hi, everyone.

Recently in one of our online ceph cluster, one OSD suicided itself after experiencing some network connectivity problem, and the OSD log is as follows:

  -173> 2017-01-03 23:42:19.145490 7f5021bbc700  0 -- 10.205.49.55:6802/1778451 >> 10.205.49.174:6803/1499671 pipe(0x7f50ec2ce000 sd=369 :52875 s=2 pgs=13750 cs=25 l=0 c=0x7f50ec232b80).fault with nothing to send, going
 to standby
  -172> 2017-01-03 23:42:19.153910 7f4f85db6700  0 -- 10.205.49.55:6802/1778451 >> 10.205.50.238:6800/1678052 pipe(0x7f507129f000 sd=572 :42936 s=2 pgs=18955 cs=1399 l=0 c=0x7f50ec048c80).fault with nothing to send, going
 to standby
  -171> 2017-01-03 23:42:19.193380 7f506d8d8700  0 -- 10.205.49.55:6802/1778451 >> 10.205.51.43:6816/1674773 pipe(0x7f4fe15b1000 sd=9514 :6802 s=2 pgs=14830 cs=33 l=0 c=0x7f50ec231980).fault with nothing to send, going
 to standby
  -170> 2017-01-03 23:42:19.194879 7f50c1dfc700  0 -- 10.205.49.55:6802/1778451 >> 10.205.49.176:6808/1465666 pipe(0x7f501a156000 sd=4867 :6802 s=2 pgs=15410 cs=189 l=0 c=0x7f50ec232880).fault with nothing to send, going
 to standby
  -169> 2017-01-03 23:42:19.913463 7f50f733b700  5 osd.619 76989 tick
  -168> 2017-01-03 23:42:20.913561 7f50f733b700  5 osd.619 76989 tick
  -167> 2017-01-03 23:42:21.472495 7f5096bff700  5 osd.619 76989 heartbeat: osd_stat(75349 MB used, 156 GB avail, 243 GB total, peers [5,46,49,59,69,83,88,100,102,103,104,106,117,120,121,126,131,132,133,137,143,150,163,174,185,207,211,215,220,230,232,263,270,271,283,287,297,299,306,314,315,320,327,348,352,361,435,440,444,446,453,477,478,489,618,620,634,676,678,687]/[]
 op hist [])
  -166> 2017-01-03 23:42:21.477995 7f50e78e4700  3 osd.619 76989 handle_osd_map epochs [76990,76991], i have 76989, src has [76281,76991]
  -165> 2017-01-03 23:42:21.494528 7f50e78e4700  0 log_channel(cluster) log [WRN] : map e76991 wrongly marked me down
  -164> 2017-01-03 23:42:21.494545 7f50e78e4700  1 osd.619 76991 start_waiting_for_healthy
  -163> 2017-01-03 23:42:21.495256 7f50e60e1700  1 osd.619 76991 ms_handle_reset con 0x7f50ec047c00 session 0x7f50b0c6d080
  -162> 2017-01-03 23:42:21.495278 7f50e60e1700  1 osd.619 76991 ms_handle_reset con 0x7f50e9ca7480 session 0x7f50d606a080
  -161> 2017-01-03 23:42:21.495296 7f50e60e1700  1 osd.619 76991 ms_handle_reset con 0x7f50ec047000 session 0x7f50cb083080
  -160> 2017-01-03 23:42:21.495300 7f50e60e1700  1 osd.619 76991 ms_handle_reset con 0x7f50ec047780 session 0x7f50c1846080
  -159> 2017-01-03 23:42:21.495315 7f50e60e1700  1 osd.619 76991 ms_handle_reset con 0x7f50ec048200 session 0x7f50a6c3b080
  …….
  -52> 2017-01-03 23:42:21.500216 7f50e60e1700  1 osd.619 76991 ms_handle_reset con 0x7f50e9ca6880 session 0x7f50a2434080
  -51> 2017-01-03 23:42:21.500219 7f50e60e1700  1 osd.619 76991 ms_handle_reset con 0x7f50ec230480 session 0x7f50b8c6c080
  -50> 2017-01-03 23:42:21.500235 7f50e60e1700  1 osd.619 76991 ms_handle_reset con 0x7f50ec230780 session 0x7f50b3c92080
  -49> 2017-01-03 23:42:27.374774 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
  -48> 2017-01-03 23:42:28.474921 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
  -47> 2017-01-03 23:42:29.575059 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
  -46> 2017-01-03 23:42:33.075190 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
 -45> 2017-01-03 23:42:33.575338 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
  -44> 2017-01-03 23:42:39.475486 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
  -43> 2017-01-03 23:42:41.775637 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
  -42> 2017-01-03 23:42:45.689707 7f5052faf700  0 -- 10.205.49.55:6809/2778451 >> 10.205.49.173:6808/1489836 pipe(0x7f4f995d8000 sd=17 :34924 s=2 pgs=15614 cs=1 l=0 c=0x7f502f484600).fault, initiating reconnect
  -41> 2017-01-03 23:42:45.689752 7f5053bbb700  0 -- 10.205.49.55:6809/2778451 >> 10.205.51.43:6806/1666935 pipe(0x7f4fb89a3000 sd=22 :59560 s=2 pgs=25257 cs=1 l=0 c=0x7f502f484780).fault, initiating reconnect
  -40> 2017-01-03 23:42:45.690133 7f5064343700  0 -- 10.205.49.55:6809/2778451 >> 10.205.49.173:6808/1489836 pipe(0x7f4f995d8000 sd=17 :34926 s=1 pgs=15614 cs=2 l=0 c=0x7f502f484600).connect got RESETSESSION
  -39> 2017-01-03 23:42:45.690159 7f50523a3700  0 -- 10.205.49.55:6809/2778451 >> 10.205.51.43:6806/1666935 pipe(0x7f4fb89a3000 sd=22 :59562 s=1 pgs=25257 cs=2 l=0 c=0x7f502f484780).connect got RESETSESSION
  -38> 2017-01-03 23:42:47.463427 7f50531b1700  0 -- 10.205.49.55:6809/2778451 >> 10.205.49.175:6813/1151278 pipe(0x7f4fb8ac7000 sd=39 :36648 s=2 pgs=18008 cs=1 l=0 c=0x7f502f484c00).fault, initiating reconnect
  -37> 2017-01-03 23:42:47.463513 7f50540c0700  0 -- 10.205.49.55:6809/2778451 >> 10.205.49.109:6806/1814550 pipe(0x7f502f465000 sd=40 :39670 s=2 pgs=24898 cs=1 l=0 c=0x7f502f484d80).fault, initiating reconnect
  -36> 2017-01-03 23:42:47.463857 7f5063939700  0 -- 10.205.49.55:6809/2778451 >> 10.205.49.175:6813/1151278 pipe(0x7f4fb8ac7000 sd=39 :36650 s=1 pgs=18008 cs=2 l=0 c=0x7f502f484c00).connect got RESETSESSION
  -35> 2017-01-03 23:42:47.463906 7f5051797700  0 -- 10.205.49.55:6809/2778451 >> 10.205.49.109:6806/1814550 pipe(0x7f502f465000 sd=40 :39672 s=1 pgs=24898 cs=2 l=0 c=0x7f502f484d80).connect got RESETSESSION
  -34> 2017-01-03 23:42:47.675768 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
  -33> 2017-01-03 23:42:51.175906 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
  -32> 2017-01-03 23:42:54.076042 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
  ………
      -5> 2017-01-03 23:43:58.579820 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
    -4> 2017-01-03 23:44:02.079974 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
    -3> 2017-01-03 23:44:04.380112 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
    -2> 2017-01-03 23:44:10.280259 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
    -1> 2017-01-03 23:44:16.180403 7f5096bff700  5 osd.619 76991 heartbeat: osd_stat(75351 MB used, 156 GB avail, 243 GB total, peers []/[] op hist [])
     0> 2017-01-03 23:44:19.532616 7f50fdbff700 -1 error_msg common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f50fdbff700 time 2017-01-03 23:44:19.523823
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")

ceph version 0.94.5-3-1-g81d4ad4 (81d4ad40d0c2a4b73529ff0db3c8f22acd15c398)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xb266f5]
2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2d9) [0xa5bf69]
3: (ceph::HeartbeatMap::is_healthy()+0xde) [0xa5c85e]
4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0xa5cf7c]
5: (CephContextServiceThread::entry()+0x15b) [0xb3670b]
6: (()+0x7dc5) [0x7f5100a7fdc5]
7: (clone()+0x6d) [0x7f50ff561ced]

In the logs above, we found that the OSD went down at about “2017-01-03 23:44:19”, the OSD::tick was blocked for some reason after the last tick at about “2017-01-03 23:42:20”, and finally, after the last invocation of
 “handle_osd_map”, OSD::heartbeat_peers was empty till the OSD suicide itself.
After doing some source code reading, we think that heartbeat_peers should have been reset during the execution of “handle_osd_map”, but it should be refilled after that by “handle_osd_map” method or the OSD::tick method,
 which was never done.
It seems that there is some kind of “deadlock” during the OSD::tick and the peering process. Is this right?

Please help me, thank youJ

And by the way, when we first tried to restart OSD who committed suicide through “/etc/init.d/ceph start osd.619”, an error was reported, and it said something like “OSD.619 is not found”, which seemed that OSD.619 was
 never created in this cluster. We are really confused, please help us.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com