On Tue, Dec 4, 2018 at 6:55 PM <ceph@xxxxxxxxxxxxxx> wrote: > > Hi, > > I have some wild freeze using cephfs with the kernel driver > For instance: > [Tue Dec 4 10:57:48 2018] libceph: mon1 10.5.0.88:6789 session lost, > hunting for new mon > [Tue Dec 4 10:57:48 2018] libceph: mon2 10.5.0.89:6789 session established > [Tue Dec 4 10:58:20 2018] ceph: mds0 caps stale > [..] server is now frozen, filesystem accesses are stuck > [Tue Dec 4 11:13:02 2018] libceph: mds0 10.5.0.88:6804 socket closed > (con state OPEN) > [Tue Dec 4 11:13:03 2018] libceph: mds0 10.5.0.88:6804 connection reset > [Tue Dec 4 11:13:03 2018] libceph: reset on mds0 > [Tue Dec 4 11:13:03 2018] ceph: mds0 closed our session > [Tue Dec 4 11:13:03 2018] ceph: mds0 reconnect start > [Tue Dec 4 11:13:04 2018] ceph: mds0 reconnect denied > [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for > 000000003f1ae609 1099692263746 > [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for > 00000000ccd58b71 1099692263749 > [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for > 00000000da5acf8f 1099692263750 > [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for > 000000005ddc2fcf 1099692263751 > [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for > 00000000469a70f4 1099692263754 > [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for > 000000005c0038f9 1099692263757 > [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for > 00000000e7288aa2 1099692263758 > [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for > 00000000b431209a 1099692263759 > [Tue Dec 4 11:13:04 2018] libceph: mds0 10.5.0.88:6804 socket closed > (con state NEGOTIATING) > [Tue Dec 4 11:13:31 2018] libceph: osd12 10.5.0.89:6805 socket closed > (con state OPEN) > [Tue Dec 4 11:13:35 2018] libceph: osd17 10.5.0.89:6800 socket closed > (con state OPEN) > [Tue Dec 4 11:13:35 2018] libceph: osd9 10.5.0.88:6813 socket closed > (con state OPEN) > [Tue Dec 4 11:13:41 2018] libceph: osd0 10.5.0.87:6800 socket closed > (con state OPEN) > > Kernel 4.17 is used, we got the same issue with 4.18 > Ceph 13.2.1 is used > From what I understand, the kernel hang itself for some reason (perhaps > it simply cannot handle some wild event) > > Is there a fix for that ? > > Secondly, it seems that the kernel reconnect itself after 15 minutes > everytime > Where is that tunable ? Could I lower that variables, so that hang have > less impacts ? > This is more like network issue. check if there is firewall between mds and client > > On ceph.log, I get Health check failed: 1 MDSs report slow requests > (MDS_SLOW_REQUEST), but this is probably the consequence, not the cause > > Any tip ? > > Best regards, > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com