Restarting the nodes causes the hanging again. This means that this is workload dependent and not a transient state. I believe I've tracked down what is happening. One user was running 1500-2000 jobs in a single directory with 92000+ files in it. I am wondering if the cluster was getting ready to fragment the directory something freaked out, perhaps not able to get all the caps back from the nodes (if that is even required). I've stopped that user's jobs for the time being, and will probably address it with them Monday. If it is the issue, can I tell the mds to pre-fragment the directory before I re-enable their jobs? -- Adam On Sat, Jan 12, 2019 at 7:53 PM Adam Tygart <mozes@xxxxxxx> wrote: > > On a hunch, I shutdown the compute nodes for our HPC cluster, and 10 > minutes after that restarted the mds daemon. It replayed the journal, > evicted the dead compute nodes and is working again. > > This leads me to believe there was a broken transaction of some kind > coming from the compute nodes (also all running CentOS 7.6 and using > the kernel cephfs mount). I hope there is enough logging from before > to try to track this issue down. > > We are back up and running for the moment. > -- > Adam > > > > On Sat, Jan 12, 2019 at 11:23 AM Adam Tygart <mozes@xxxxxxx> wrote: > > > > Hello all, > > > > I've got a 31 machine Ceph cluster running ceph 12.2.10 and CentOS 7.6. > > > > We're using cephfs and rbd. > > > > Last night, one of our two active/active mds servers went laggy and > > upon restart once it goes active it immediately goes laggy again. > > > > I've got a log available here (debug_mds 20, debug_objecter 20): > > https://people.cs.ksu.edu/~mozes/ceph-mds-laggy-20190112.log.gz > > > > It looks like I might not have the right log levels. Thoughts on debugging this? > > > > -- > > Adam > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com