Re: hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 3 Feb 2016 02:25:48 -0800



On Wed, Feb 3, 2016 at 1:21 AM, Nikola Ciprich
<nikola.ciprich@xxxxxxxxxxx> wrote:
> Hello fellow ceph users and developers
>
> few days ago, I've update one our small cluster
> (three nodes) to kernel 4.1.15. Today I got cephfs
> stuck on one of the nodes.
>
> cpeh -s reports:
> mds0: Behind on trimming (155/30)
>
> restarting all MDS servers didn't help.
>
> all three cluster nodes are running hammer 0.94.5 on
> Centos 6, kernel 4.1.15.
>
> Each node runs 7 OSD daemons, monitor and MDS server
> (I know it's better to run those daemons separately, but
> we were tight on budget here and hardware should be sufficient)

What's the full output of "ceph -s"? Have you looked at the MDS admin
socket at all — what state does it say it's in?
-Greg

>
> My question here is:
>
> 1) is there some known issue with hammer 0.94.5 or kernel 4.1.15
> which could lead to cephfs hangs?
>
> 2) what can I do to debug what is the cause of this hang?
>
> 3) is there a way to recover this without hard resetting
> node with hung cephfs mount?
>
> If I could provide more information, please let me know
>
> I'd really appreciate any help
>
> with best regards
>
> nik
>
>
>
>
> --
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
>
> tel.:   +420 591 166 214
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis@xxxxxxxxxxx
> -------------------------------------
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com