Re: Ceph MDS laggy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On a hunch, I shutdown the compute nodes for our HPC cluster, and 10
minutes after that restarted the mds daemon. It replayed the journal,
evicted the dead compute nodes and is working again.

This leads me to believe there was a broken transaction of some kind
coming from the compute nodes (also all running CentOS 7.6 and using
the kernel cephfs mount). I hope there is enough logging from before
to try to track this issue down.

We are back up and running for the moment.
--
Adam



On Sat, Jan 12, 2019 at 11:23 AM Adam Tygart <mozes@xxxxxxx> wrote:
>
> Hello all,
>
> I've got a 31 machine Ceph cluster running ceph 12.2.10 and CentOS 7.6.
>
> We're using cephfs and rbd.
>
> Last night, one of our two active/active mds servers went laggy and
> upon restart once it goes active it immediately goes laggy again.
>
> I've got a log available here (debug_mds 20, debug_objecter 20):
> https://people.cs.ksu.edu/~mozes/ceph-mds-laggy-20190112.log.gz
>
> It looks like I might not have the right log levels. Thoughts on debugging this?
>
> --
> Adam
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux