On Tue, Jan 21, 2020 at 12:09 AM Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> wrote: > > Hi, I did as you asked and created a thread dump with GDB on the > blocking MDS. Here's the result: https://pastebin.com/pPbNvfdb > I don't find any clue from the backtrace. please run 'ceph daemon mds.xxxx dump_historic_ops' and ''ceph daemon mds.xxx perf reset; ceph daemon mds.xxx perf dump'. send the outputs to us. > > On 17/01/2020 13:07, Yan, Zheng wrote: > > On Fri, Jan 17, 2020 at 4:47 PM Janek Bevendorff > > <janek.bevendorff@xxxxxxxxxxxxx> wrote: > >> Hi, > >> > >> We have a CephFS in our cluster with 3 MDS to which > 300 clients > >> connect at any given time. The FS contains about 80 TB of data and many > >> million files, so it is important that meta data operations work > >> smoothly even when listing large directories. > >> > >> Previously, we had massive stability problems causing the MDS nodes to > >> crash or time out regularly as a result of failing to recall caps fast > >> enough and weren't able to rejoin afterwards without resetting the > >> mds*_openfiles objects (see > >> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/AOYWQSONTFROPB4DXVYADWW7V25C3G6Z/ > >> for details). > >> > >> We have managed to adjust our configuration to avoid this problem. This > >> comes down mostly to adjusting the recall decay rate (which still isn't > >> documented), massively reducing any scrubbing activities, allowing for > >> no more than 10G for mds_cache_memory_limit (the default of 1G is way > >> too low, but more than 10G seems to cause trouble during replay), > >> increasing osd_map_message_max to 100, and osd_map_cache_size to 150. We > >> haven't seen crashes since. But what we do see is that one of the MDS > >> nodes will randomly lock up and the ceph_mds_reply_latency metric goes > >> up and then stays at a higher level than any other MDS. The result is > >> not that the FS is completely down, but everything lags massively to the > >> point where it's not usable. > >> > >> Unfortunately, all the hung MDS is reporting is: > >> > >> -77> 2020-01-17 09:29:17.891 7f34c967b700 0 mds.beacon.XXX Skipping > >> beacon heartbeat to monitors (last acked 320.587s ago); MDS internal > >> heartbeat is not healthy! > >> -76> 2020-01-17 09:29:18.391 7f34c967b700 1 heartbeat_map > >> is_healthy 'MDSRank' had timed out after 15 > >> > >> and ceph fs status reports only single-digit ops/s for all three MDSs > >> (mostly flat 0). I ran ceph mds fail 1 to fail the MDS and force a > >> standby to take over, which went without problems. Almost immediately > >> after, all three now-active MDSs started reporting > 900 ops/s and the > >> FS started working properly again. For some strange reason, the failed > >> MDS didn't restart, though. It kept reporting the log message above > >> until I manually restarted the daemon process. > >> > > Looks like mds entered same long (/infinite) loops. If this happens > > again, could you use gdb to attach it, and run command 'thread apply > > all bt' inside gdb > > > >> Is anybody else experiencing such issues or are there any configuration > >> parameters that I can tweak to avoid this behaviour? > >> > >> Thanks > >> Janek > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com