CephFS filesystem mount tanks on some nodes?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

We have a CephFS filesystem where we are running Reef on the servers (OSD/MDS/MGR/MON) and Quincy on the clients.

Every once in a while, one of the clients will stop allowing access to my CephFS filesystem, the error being "permission denied" while try to access the filesystem on that node. The fix is to force unmount the filesystem and remount it, then it's fine again. Any idea how I can prevent this?

I see this in the client node logs:

Mar 25 11:34:46 phoenix-07 kernel: [50508.354036] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 11:34:46 phoenix-07 kernel: [50508.359650] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 11:34:46 phoenix-07 kernel: [50508.367657] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 11:36:46 phoenix-07 kernel: [50629.189000] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 11:36:46 phoenix-07 kernel: [50629.192579] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 11:36:46 phoenix-07 kernel: [50629.196103] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 11:38:47 phoenix-07 kernel: [50750.024268] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 11:38:47 phoenix-07 kernel: [50750.031520] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 11:38:47 phoenix-07 kernel: [50750.038594] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 11:40:48 phoenix-07 kernel: [50870.853281] ? __touch_cap+0x24/0xd0 [ceph] Mar 25 22:55:38 phoenix-07 kernel: [91360.583032] libceph: mds0 (1)10.50.1.75:6801 socket closed (con state OPEN) Mar 25 22:55:38 phoenix-07 kernel: [91360.667914] libceph: mds0 (1)10.50.1.75:6801 session reset Mar 25 22:55:38 phoenix-07 kernel: [91360.667923] ceph: mds0 closed our session
Mar 25 22:55:38 phoenix-07 kernel: [91360.667925] ceph: mds0 reconnect start
Mar 25 22:55:52 phoenix-07 kernel: [91374.541614] ceph: mds0 reconnect denied Mar 25 22:55:52 phoenix-07 kernel: [91374.541726] ceph: dropping dirty+flushing Fw state for 00000000ea96c18f 1099683115069 Mar 25 22:55:52 phoenix-07 kernel: [91374.541732] ceph: dropping dirty+flushing Fw state for 00000000ce495f00 1099687100635 Mar 25 22:55:52 phoenix-07 kernel: [91374.541737] ceph: dropping dirty+flushing Fw state for 0000000073ebb190 1099687100636 Mar 25 22:55:52 phoenix-07 kernel: [91374.541744] ceph: dropping dirty+flushing Fw state for 0000000091337e6a 1099687100637 Mar 25 22:55:52 phoenix-07 kernel: [91374.541746] ceph: dropping dirty+flushing Fw state for 000000009075ecd8 1099687100634 Mar 25 22:55:52 phoenix-07 kernel: [91374.541751] ceph: dropping dirty+flushing Fw state for 00000000d1d4c51f 1099687100633 Mar 25 22:55:52 phoenix-07 kernel: [91374.541781] ceph: dropping dirty+flushing Fw state for 0000000063dec1e4 1099687100632 Mar 25 22:55:52 phoenix-07 kernel: [91374.541793] ceph: dropping dirty+flushing Fw state for 000000008b3124db 1099687100638 Mar 25 22:55:52 phoenix-07 kernel: [91374.541796] ceph: dropping dirty+flushing Fw state for 00000000d9e76d8b 1099687100471 Mar 25 22:55:52 phoenix-07 kernel: [91374.541798] ceph: dropping dirty+flushing Fw state for 00000000b57da610 1099685041085 Mar 25 22:55:52 phoenix-07 kernel: [91374.542235] libceph: mds0 (1)10.50.1.75:6801 socket closed (con state V1_CONNECT_MSG) Mar 25 22:55:52 phoenix-07 kernel: [91374.791652] ceph: mds0 rejected session Mar 25 23:01:51 phoenix-07 kernel: [91733.308806] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 25 23:01:56 phoenix-07 kernel: [91738.182127] ceph: check_quota_exceeded: ino (1000a1cb4a8.fffffffffffffffe) null i_snap_realm Mar 25 23:01:56 phoenix-07 kernel: [91738.188225] ceph: check_quota_exceeded: ino (1000a1cb4a8.fffffffffffffffe) null i_snap_realm Mar 25 23:01:56 phoenix-07 kernel: [91738.233658] ceph: check_quota_exceeded: ino (1000a1cb4aa.fffffffffffffffe) null i_snap_realm Mar 25 23:25:52 phoenix-07 kernel: [93174.787630] libceph: mds0 (1)10.50.1.75:6801 socket closed (con state OPEN) Mar 25 23:39:45 phoenix-07 kernel: [94007.751879] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 00:03:28 phoenix-07 kernel: [95430.158646] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 00:39:45 phoenix-07 kernel: [97607.685421] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 00:43:34 phoenix-07 kernel: [97836.681145] ceph: check_quota_exceeded: ino (1000a306503.fffffffffffffffe) null i_snap_realm Mar 26 00:43:34 phoenix-07 kernel: [97836.686797] ceph: check_quota_exceeded: ino (1000a306503.fffffffffffffffe) null i_snap_realm Mar 26 00:43:34 phoenix-07 kernel: [97836.729046] ceph: check_quota_exceeded: ino (1000a306505.fffffffffffffffe) null i_snap_realm Mar 26 00:49:39 phoenix-07 kernel: [98201.302564] ceph: check_quota_exceeded: ino (1000a75677d.fffffffffffffffe) null i_snap_realm Mar 26 00:49:39 phoenix-07 kernel: [98201.305676] ceph: check_quota_exceeded: ino (1000a75677d.fffffffffffffffe) null i_snap_realm Mar 26 00:49:39 phoenix-07 kernel: [98201.347267] ceph: check_quota_exceeded: ino (1000a755fe3.fffffffffffffffe) null i_snap_realm Mar 26 01:04:49 phoenix-07 kernel: [99111.892854] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 01:39:45 phoenix-07 kernel: [101207.645602] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 02:05:35 phoenix-07 kernel: [102757.494073] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 02:39:45 phoenix-07 kernel: [104807.617467] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 03:03:14 phoenix-07 kernel: [106216.519979] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 03:39:45 phoenix-07 kernel: [108407.731139] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 04:04:05 phoenix-07 kernel: [109867.406047] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 04:39:45 phoenix-07 kernel: [112007.672100] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 05:04:39 phoenix-07 kernel: [113501.101302] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 05:39:45 phoenix-07 kernel: [115607.696806] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 06:05:08 phoenix-07 kernel: [117130.484942] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 06:39:45 phoenix-07 kernel: [119207.706740] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 07:03:33 phoenix-07 kernel: [120635.419910] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 07:39:45 phoenix-07 kernel: [122807.700416] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 08:02:44 phoenix-07 kernel: [124186.804150] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 08:39:45 phoenix-07 kernel: [126407.696256] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 09:02:15 phoenix-07 kernel: [127757.573231] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 09:39:45 phoenix-07 kernel: [130007.718852] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm

And after I unmount/remount the filesystem I see this:

Mar 26 10:02:57 phoenix-07 kernel: [131399.351230] ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Mar 26 10:04:28 phoenix-07 kernel: [131490.722855] ceph: No path or : separator in source Mar 26 10:04:28 phoenix-07 kernel: [131490.727023] libceph: mon0 (1)10.50.1.74:6789 session established Mar 26 10:04:28 phoenix-07 kernel: [131490.730211] libceph: client102330 fsid 58bde08a-d7ed-11ee-9098-506b4b4da440

Is it possibly losing connection to the MDS servers or something? These lines specifically concern me:

Mar 25 22:55:38 phoenix-07 kernel: [91360.667923] ceph: mds0 closed our session
Mar 25 22:55:38 phoenix-07 kernel: [91360.667925] ceph: mds0 reconnect start
Mar 25 22:55:52 phoenix-07 kernel: [91374.541614] ceph: mds0 reconnect denied

It looks like it lost its connection, tried to reconnect and was denied? Does that ring a bell for anyone?

The filesystem *is* extremely busy...

Thanks again!

-erich
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux