Re: Cephfs mount not recovering after icmp-not-reachable

胡玮文 <huww98@xxxxxxxxxxx> · Tue, 15 Jun 2021 00:53:22 +0800

Hi Simon,

If you have a recent enough kernel, you can try the "recover_session" 
mount option [1]. Read the doc and be aware of what will happen if the 
client try to recover after being blacklisted.

[1]: https://docs.ceph.com/en/latest/man/8/mount.ceph/#basic

Weiwen Hu

在 2021/6/14 下午11:07, Simon Sutter 写道:
Hello everyone!

We had a switch outage and the ceph kernel mount did not work anymore.
This is the fstab entry:
10.99.10.1:/somefolder   /cephfs  ceph     _netdev,nofail,name=cephcluster,secret=IsSecret      0 0

I reproduced it with disabling the vlan on the switch on which the ceph is reachable, which gives a icmp-not-reachable.
I did this for five minutes, after that, "ls /cephfs" just gives a "permission denied"

in dmesg i can see this:

[ 1412.994921] libceph: mon1 10.99.10.4:6789 session lost, hunting for new mon
[ 1413.009325] libceph: mon0 10.99.10.1:6789 session established
[ 1452.998646] libceph: mon2 10.99.15.3:6789 session lost, hunting for new mon
[ 1452.998679] libceph: mon0 10.99.10.1:6789 session lost, hunting for new mon
[ 1461.989549] libceph: mon4 10.99.15.5:6789 socket closed (con state CONNECTING)
---
[ 1787.045148] libceph: mon3 10.99.15.4:6789 socket closed (con state CONNECTING)
[ 1787.062587] libceph: mon0 10.99.10.1:6789 session established
[ 1787.086103] libceph: mon4 10.99.15.5:6789 session established
[ 1814.028761] libceph: mds0 10.99.10.4:6801 socket closed (con state OPEN)
[ 1815.029811] libceph: mds0 10.99.10.4:6801 connection reset
[ 1815.029829] libceph: reset on mds0
[ 1815.029831] ceph: mds0 closed our session
[ 1815.029833] ceph: mds0 reconnect start
[ 1815.052219] ceph: mds0 reconnect denied
[ 1815.052229] ceph:  dropping dirty Fw state for ffff9d9085da1340 1099512175611
[ 1815.052231] ceph:  dropping dirty+flushing Fw state for ffff9d9085da1340 1099512175611
[ 1815.273008] libceph: mds0 10.99.10.4:6801 socket closed (con state NEGOTIATING)
[ 1816.033241] ceph: mds0 rejected session
[ 1829.018643] ceph: mds0 hung
[ 1880.088504] ceph: mds0 came back
[ 1880.088662] ceph: mds0 caps renewed
[ 1880.094018] ceph: get_quota_realm: ino (10000000afe.fffffffffffffffe) null i_snap_realm
[ 1881.100367] ceph: get_quota_realm: ino (10000000afe.fffffffffffffffe) null i_snap_realm
[ 2046.768969] conntrack: generic helper won't handle protocol 47. Please consider loading the specific helper module.
[ 2061.731126] ceph: get_quota_realm: ino (10000000afe.fffffffffffffffe) null i_snap_realm

Is this a bug to report or wrong configuration?
Did someone else had this before?

To solve the problem, a simple remount does the trick.

Thanks in advance
Simon

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx