Re: Ceph MDS laggy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Restarting the nodes causes the hanging again. This means that this is
workload dependent and not a transient state.

I believe I've tracked down what is happening. One user was running
1500-2000 jobs in a single directory with 92000+ files in it. I am
wondering if the cluster was getting ready to fragment the directory
something freaked out, perhaps not able to get all the caps back from
the nodes (if that is even required).

I've stopped that user's jobs for the time being, and will probably
address it with them Monday. If it is the issue, can I tell the mds to
pre-fragment the directory before I re-enable their jobs?

--
Adam

On Sat, Jan 12, 2019 at 7:53 PM Adam Tygart <mozes@xxxxxxx> wrote:
>
> On a hunch, I shutdown the compute nodes for our HPC cluster, and 10
> minutes after that restarted the mds daemon. It replayed the journal,
> evicted the dead compute nodes and is working again.
>
> This leads me to believe there was a broken transaction of some kind
> coming from the compute nodes (also all running CentOS 7.6 and using
> the kernel cephfs mount). I hope there is enough logging from before
> to try to track this issue down.
>
> We are back up and running for the moment.
> --
> Adam
>
>
>
> On Sat, Jan 12, 2019 at 11:23 AM Adam Tygart <mozes@xxxxxxx> wrote:
> >
> > Hello all,
> >
> > I've got a 31 machine Ceph cluster running ceph 12.2.10 and CentOS 7.6.
> >
> > We're using cephfs and rbd.
> >
> > Last night, one of our two active/active mds servers went laggy and
> > upon restart once it goes active it immediately goes laggy again.
> >
> > I've got a log available here (debug_mds 20, debug_objecter 20):
> > https://people.cs.ksu.edu/~mozes/ceph-mds-laggy-20190112.log.gz
> >
> > It looks like I might not have the right log levels. Thoughts on debugging this?
> >
> > --
> > Adam
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux