Why is the client frozen at the first place ? Is this because it somehow lost the connection to the mon (have not found anything about this yet) ? How can I prevent this ? Can I make the client reconnect in less that 15 minutes, to lessen the impact ? Best regards, On 12/04/2018 07:41 PM, Gregory Farnum wrote: > Yes, this is exactly it with the "reconnect denied". > -Greg > > On Tue, Dec 4, 2018 at 3:00 AM NingLi <lining916740672@xxxxxxxxxx> wrote: > >> >> Hi,maybe this reference can help you >> >> >> http://docs.ceph.com/docs/master/cephfs/troubleshooting/#disconnected-remounted-fs >> >> >>> On Dec 4, 2018, at 18:55, ceph@xxxxxxxxxxxxxx wrote: >>> >>> Hi, >>> >>> I have some wild freeze using cephfs with the kernel driver >>> For instance: >>> [Tue Dec 4 10:57:48 2018] libceph: mon1 10.5.0.88:6789 session lost, >>> hunting for new mon >>> [Tue Dec 4 10:57:48 2018] libceph: mon2 10.5.0.89:6789 session >> established >>> [Tue Dec 4 10:58:20 2018] ceph: mds0 caps stale >>> [..] server is now frozen, filesystem accesses are stuck >>> [Tue Dec 4 11:13:02 2018] libceph: mds0 10.5.0.88:6804 socket closed >>> (con state OPEN) >>> [Tue Dec 4 11:13:03 2018] libceph: mds0 10.5.0.88:6804 connection reset >>> [Tue Dec 4 11:13:03 2018] libceph: reset on mds0 >>> [Tue Dec 4 11:13:03 2018] ceph: mds0 closed our session >>> [Tue Dec 4 11:13:03 2018] ceph: mds0 reconnect start >>> [Tue Dec 4 11:13:04 2018] ceph: mds0 reconnect denied >>> [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for >>> 000000003f1ae609 1099692263746 >>> [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for >>> 00000000ccd58b71 1099692263749 >>> [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for >>> 00000000da5acf8f 1099692263750 >>> [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for >>> 000000005ddc2fcf 1099692263751 >>> [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for >>> 00000000469a70f4 1099692263754 >>> [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for >>> 000000005c0038f9 1099692263757 >>> [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for >>> 00000000e7288aa2 1099692263758 >>> [Tue Dec 4 11:13:04 2018] ceph: dropping dirty+flushing Fw state for >>> 00000000b431209a 1099692263759 >>> [Tue Dec 4 11:13:04 2018] libceph: mds0 10.5.0.88:6804 socket closed >>> (con state NEGOTIATING) >>> [Tue Dec 4 11:13:31 2018] libceph: osd12 10.5.0.89:6805 socket closed >>> (con state OPEN) >>> [Tue Dec 4 11:13:35 2018] libceph: osd17 10.5.0.89:6800 socket closed >>> (con state OPEN) >>> [Tue Dec 4 11:13:35 2018] libceph: osd9 10.5.0.88:6813 socket closed >>> (con state OPEN) >>> [Tue Dec 4 11:13:41 2018] libceph: osd0 10.5.0.87:6800 socket closed >>> (con state OPEN) >>> >>> Kernel 4.17 is used, we got the same issue with 4.18 >>> Ceph 13.2.1 is used >>> From what I understand, the kernel hang itself for some reason (perhaps >>> it simply cannot handle some wild event) >>> >>> Is there a fix for that ? >>> >>> Secondly, it seems that the kernel reconnect itself after 15 minutes >>> everytime >>> Where is that tunable ? Could I lower that variables, so that hang have >>> less impacts ? >>> >>> >>> On ceph.log, I get Health check failed: 1 MDSs report slow requests >>> (MDS_SLOW_REQUEST), but this is probably the consequence, not the cause >>> >>> Any tip ? >>> >>> Best regards, >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com