On Thu, Oct 4, 2018 at 3:58 PM Stefan Kooman <stefan@xxxxxx> wrote: > A couple of hours later we hit the same issue. We restarted with > debug_mds=20 and debug_journaler=20 on the standby-replay node. Eight > hours later (an hour ago) we hit the same issue. We captured ~ 4.7 GB of > logging.... I skipped to the end of the log file just before the > "hearbeat_map" messages start: > > 2018-10-04 23:23:53.144644 7f415ebf4700 20 mds.0.locker client.17079146 pending pAsLsXsFscr allowed pAsLsXsFscr wanted pFscr > 2018-10-04 23:23:53.144645 7f415ebf4700 10 mds.0.locker eval done > 2018-10-04 23:23:55.088542 7f415bbee700 10 mds.beacon.mds2 _send up:active seq 5021 > 2018-10-04 23:23:59.088602 7f415bbee700 10 mds.beacon.mds2 _send up:active seq 5022 > 2018-10-04 23:24:03.088688 7f415bbee700 10 mds.beacon.mds2 _send up:active seq 5023 > 2018-10-04 23:24:07.088775 7f415bbee700 10 mds.beacon.mds2 _send up:active seq 5024 > 2018-10-04 23:24:11.088867 7f415bbee700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 > 2018-10-04 23:24:11.088871 7f415bbee700 1 mds.beacon.mds2 _send skipping beacon, heartbeat map not healthy > > As far as I can see just normal behaviour. > > The big question is: what is happening when the mds start logging the hearbeat_map messages? > Why does it log "heartbeat_map is_healthy", just to log .000004 seconds later it's not healthy? > > Ceph version: 12.2.8 on all nodes (mon, osd, mds) > mds: one active / one standby-replay > > The system was not under any kind of resource pressure: plenty of CPU, RAM > available. Metrics all look normal up to the moment things go into a deadlock > (so it seems). Thanks for the detailed notes. It looks like the MDS is stuck somewhere it's not even outputting any log messages. If possible, it'd be helpful to get a coredump (e.g. by sending SIGQUIT to the MDS) or, if you're comfortable with gdb, a backtrace of any threads that look suspicious (e.g. not waiting on a futex) including `info threads`. -- Patrick Donnelly _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com