Hi All,
We have a CephFS filesystem where we are running Reef on the servers
(OSD/MDS/MGR/MON) and Quincy on the clients.
Every once in a while, one of the clients will stop allowing access to
my CephFS filesystem, the error being "permission denied" while try to
access the filesystem on that node. The fix is to force unmount the
filesystem and remount it, then it's fine again. Any idea how I can
prevent this?
I see this in the client node logs:
Mar 25 11:34:46 phoenix-07 kernel: [50508.354036] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:34:46 phoenix-07 kernel: [50508.359650] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:34:46 phoenix-07 kernel: [50508.367657] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:36:46 phoenix-07 kernel: [50629.189000] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:36:46 phoenix-07 kernel: [50629.192579] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:36:46 phoenix-07 kernel: [50629.196103] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:38:47 phoenix-07 kernel: [50750.024268] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:38:47 phoenix-07 kernel: [50750.031520] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:38:47 phoenix-07 kernel: [50750.038594] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 11:40:48 phoenix-07 kernel: [50870.853281] ?
__touch_cap+0x24/0xd0 [ceph]
Mar 25 22:55:38 phoenix-07 kernel: [91360.583032] libceph: mds0
(1)10.50.1.75:6801 socket closed (con state OPEN)
Mar 25 22:55:38 phoenix-07 kernel: [91360.667914] libceph: mds0
(1)10.50.1.75:6801 session reset
Mar 25 22:55:38 phoenix-07 kernel: [91360.667923] ceph: mds0 closed our
session
Mar 25 22:55:38 phoenix-07 kernel: [91360.667925] ceph: mds0 reconnect start
Mar 25 22:55:52 phoenix-07 kernel: [91374.541614] ceph: mds0 reconnect
denied
Mar 25 22:55:52 phoenix-07 kernel: [91374.541726] ceph: dropping
dirty+flushing Fw state for 00000000ea96c18f 1099683115069
Mar 25 22:55:52 phoenix-07 kernel: [91374.541732] ceph: dropping
dirty+flushing Fw state for 00000000ce495f00 1099687100635
Mar 25 22:55:52 phoenix-07 kernel: [91374.541737] ceph: dropping
dirty+flushing Fw state for 0000000073ebb190 1099687100636
Mar 25 22:55:52 phoenix-07 kernel: [91374.541744] ceph: dropping
dirty+flushing Fw state for 0000000091337e6a 1099687100637
Mar 25 22:55:52 phoenix-07 kernel: [91374.541746] ceph: dropping
dirty+flushing Fw state for 000000009075ecd8 1099687100634
Mar 25 22:55:52 phoenix-07 kernel: [91374.541751] ceph: dropping
dirty+flushing Fw state for 00000000d1d4c51f 1099687100633
Mar 25 22:55:52 phoenix-07 kernel: [91374.541781] ceph: dropping
dirty+flushing Fw state for 0000000063dec1e4 1099687100632
Mar 25 22:55:52 phoenix-07 kernel: [91374.541793] ceph: dropping
dirty+flushing Fw state for 000000008b3124db 1099687100638
Mar 25 22:55:52 phoenix-07 kernel: [91374.541796] ceph: dropping
dirty+flushing Fw state for 00000000d9e76d8b 1099687100471
Mar 25 22:55:52 phoenix-07 kernel: [91374.541798] ceph: dropping
dirty+flushing Fw state for 00000000b57da610 1099685041085
Mar 25 22:55:52 phoenix-07 kernel: [91374.542235] libceph: mds0
(1)10.50.1.75:6801 socket closed (con state V1_CONNECT_MSG)
Mar 25 22:55:52 phoenix-07 kernel: [91374.791652] ceph: mds0 rejected
session
Mar 25 23:01:51 phoenix-07 kernel: [91733.308806] ceph: get_quota_realm:
ino (1.fffffffffffffffe) null i_snap_realm
Mar 25 23:01:56 phoenix-07 kernel: [91738.182127] ceph:
check_quota_exceeded: ino (1000a1cb4a8.fffffffffffffffe) null i_snap_realm
Mar 25 23:01:56 phoenix-07 kernel: [91738.188225] ceph:
check_quota_exceeded: ino (1000a1cb4a8.fffffffffffffffe) null i_snap_realm
Mar 25 23:01:56 phoenix-07 kernel: [91738.233658] ceph:
check_quota_exceeded: ino (1000a1cb4aa.fffffffffffffffe) null i_snap_realm
Mar 25 23:25:52 phoenix-07 kernel: [93174.787630] libceph: mds0
(1)10.50.1.75:6801 socket closed (con state OPEN)
Mar 25 23:39:45 phoenix-07 kernel: [94007.751879] ceph: get_quota_realm:
ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 00:03:28 phoenix-07 kernel: [95430.158646] ceph: get_quota_realm:
ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 00:39:45 phoenix-07 kernel: [97607.685421] ceph: get_quota_realm:
ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 00:43:34 phoenix-07 kernel: [97836.681145] ceph:
check_quota_exceeded: ino (1000a306503.fffffffffffffffe) null i_snap_realm
Mar 26 00:43:34 phoenix-07 kernel: [97836.686797] ceph:
check_quota_exceeded: ino (1000a306503.fffffffffffffffe) null i_snap_realm
Mar 26 00:43:34 phoenix-07 kernel: [97836.729046] ceph:
check_quota_exceeded: ino (1000a306505.fffffffffffffffe) null i_snap_realm
Mar 26 00:49:39 phoenix-07 kernel: [98201.302564] ceph:
check_quota_exceeded: ino (1000a75677d.fffffffffffffffe) null i_snap_realm
Mar 26 00:49:39 phoenix-07 kernel: [98201.305676] ceph:
check_quota_exceeded: ino (1000a75677d.fffffffffffffffe) null i_snap_realm
Mar 26 00:49:39 phoenix-07 kernel: [98201.347267] ceph:
check_quota_exceeded: ino (1000a755fe3.fffffffffffffffe) null i_snap_realm
Mar 26 01:04:49 phoenix-07 kernel: [99111.892854] ceph: get_quota_realm:
ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 01:39:45 phoenix-07 kernel: [101207.645602] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 02:05:35 phoenix-07 kernel: [102757.494073] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 02:39:45 phoenix-07 kernel: [104807.617467] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 03:03:14 phoenix-07 kernel: [106216.519979] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 03:39:45 phoenix-07 kernel: [108407.731139] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 04:04:05 phoenix-07 kernel: [109867.406047] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 04:39:45 phoenix-07 kernel: [112007.672100] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 05:04:39 phoenix-07 kernel: [113501.101302] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 05:39:45 phoenix-07 kernel: [115607.696806] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 06:05:08 phoenix-07 kernel: [117130.484942] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 06:39:45 phoenix-07 kernel: [119207.706740] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 07:03:33 phoenix-07 kernel: [120635.419910] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 07:39:45 phoenix-07 kernel: [122807.700416] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 08:02:44 phoenix-07 kernel: [124186.804150] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 08:39:45 phoenix-07 kernel: [126407.696256] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 09:02:15 phoenix-07 kernel: [127757.573231] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 09:39:45 phoenix-07 kernel: [130007.718852] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
And after I unmount/remount the filesystem I see this:
Mar 26 10:02:57 phoenix-07 kernel: [131399.351230] ceph:
get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm
Mar 26 10:04:28 phoenix-07 kernel: [131490.722855] ceph: No path or :
separator in source
Mar 26 10:04:28 phoenix-07 kernel: [131490.727023] libceph: mon0
(1)10.50.1.74:6789 session established
Mar 26 10:04:28 phoenix-07 kernel: [131490.730211] libceph: client102330
fsid 58bde08a-d7ed-11ee-9098-506b4b4da440
Is it possibly losing connection to the MDS servers or something? These
lines specifically concern me:
Mar 25 22:55:38 phoenix-07 kernel: [91360.667923] ceph: mds0 closed our
session
Mar 25 22:55:38 phoenix-07 kernel: [91360.667925] ceph: mds0 reconnect start
Mar 25 22:55:52 phoenix-07 kernel: [91374.541614] ceph: mds0 reconnect
denied
It looks like it lost its connection, tried to reconnect and was denied?
Does that ring a bell for anyone?
The filesystem *is* extremely busy...
Thanks again!
-erich
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx