Quoting Patrick Donnelly (pdonnell@xxxxxxxxxx): > Thanks for the detailed notes. It looks like the MDS is stuck > somewhere it's not even outputting any log messages. If possible, it'd > be helpful to get a coredump (e.g. by sending SIGQUIT to the MDS) or, > if you're comfortable with gdb, a backtrace of any threads that look > suspicious (e.g. not waiting on a futex) including `info threads`. It took a while before the same issue reappeared again ... but we managed to catch gdb backtraces and strace output. See below pastebin links. Note: we had difficulty getting the MDSs working again, so we had to restart them a couple of times, capturing debug output as much as we can. Hopefully you can squeeze some useful information out of this data. MDS1: https://8n1.org/13869/bc3b - Some few minutes after it first started acting up https://8n1.org/13870/caf4 - Probably made when I tried to stop the process and it took too long (process already received SIGKILL) https://8n1.org/13871/2f22 - After restarting the same issue returned https://8n1.org/13872/2246 - After restarting the same issue returned MDS2: https://8n1.org/13873/f861 - After it went craycray when it became active https://8n1.org/13874/c567 - After restarting the same issue returned https://8n1.org/13875/133a - After restarting the same issue returned STRACES: MDS1: https://8n1.org/mds1-strace.zip MDS2: https://8n1.org/mds2-strace.zip Gr. Stefan -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com