Re: CephFS hangs with access denied

Toby Darling <toby@xxxxxxxxxxxxxxxxx> · Thu, 13 Feb 2020 11:42:47 +0000

Hi Dietmar

+1

We've been experiencing the exact same variations of hang / Permission 
denied / Oops as you, with cephfs 14.2.6 kernel client on Scientific 
Linux 7.[67] (3.10.0-1062.7.1.el7 and 3.10.0-957.21.3.el7).

The mds.log shows the same sequence of
  denied reconnect attempt
  Evicting (and blacklisting) client

The client log shows the same sequence of
  ceph: mds0 caps went stale, renewing
  ceph: mds0 caps stale
  libceph: mds0 10.1.3.29:6801 socket closed (con state OPEN)
  libceph: mds0 10.1.3.29:6801 connection reset
  ...
  ceph: mds0 reconnect start
  ceph: mds0 reconnect denied
  libceph: mds0 10.1.3.29:6801 socket closed (con state NEGOTIATING)
  ceph: mds0 rejected session

and dmesg similarly:
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
  IP: [<ffffffffc0ec9461>] ceph_put_snap_realm+0x21/0xe0 [ceph]

Occasionally, we can get the client reconnected (without rebooting) by 
restarting the mds, but a couple of times this has resulted in another 
client ending up in the same situation. 'umount -f' on the client is not 
working for us, it fails with "target is busy", possibly because we have 
a number of users on the clients?

I did find a client patch mentioned at 
https://tracker.ceph.com/issues/40862. Unfortunately, it's only "In 
4.19.69 and 5.2.11". The LongTerm branch for EL 
(https://elrepo.org/linux/kernel/el7/x86_64/RPMS/) is only tracking 
kernel 4.4 which does not have the fix in; MainLine is currently 5.5 and 
does have the fix, but we can't use this due to build issues with Nvidia 
and beegfs kernel modules.

Cheers
Toby

On 13/02/2020 04:00, Dietmar Rieder wrote:
Hi,

now we got a kernel crash (Oops) probably related to the my issue since
it all seems to start with a hung mds (see attached dmesg from crashed
client and mds log from mds server):

[281202.923064] Oops: 0002 [#1] SMP
[281202.924952] Modules linked in: fuse xt_multiport squashfs loop
overlay(T) xt_CHECKSUM iptable_mangle tun bridge devlink ebtable_filter
ebtables rpcsec_gss_krb5 nfsv4 nfs fscache ceph libceph dns_resolv
er 8021q garp mrp stp llc bonding rpcrdma ib_isert iscsi_target_mod
ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ip6_tables
ipt_REJECT nf_reject_ipv4 ib_srp xt_conntrack scsi_transport_srp
  scsi_tgt iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 ib_ipoib nf_nat_ipv4 nf_nat
nf_conntrack rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core d
m_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support vfat
fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm
irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel
[281202.937437]  lrw gf128mul glue_helper ablk_helper cryptd pcspkr
joydev lpc_ich hpilo hpwdt sg ioatdma wmi ipmi_si ipmi_devintf
ipmi_msghandler acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunr
pc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif
crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common
crc32c_int
el ixgbe drm tg3 hpsa mdio dca ptp drm_panel_orientation_quirks
scsi_transport_sas pps_core
[281202.949214] CPU: 41 PID: 17638 Comm: sh Kdump: loaded Tainted: G
     W      ------------ T 3.10.0-1062.12.1.el7.x86_64 #1
[281202.951583] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580
Gen9, BIOS U17 11/08/2017
[281202.953972] task: ffff8c0d71afb150 ti: ffff8b0e63404000 task.ti:
ffff8b0e63404000
[281202.956360] RIP: 0010:[<ffffffffc0cf65b1>]  [<ffffffffc0cf65b1>]
ceph_put_snap_realm+0x21/0xe0 [ceph]
[281202.958870] RSP: 0018:ffff8b0e63407be8  EFLAGS: 00010246
[281202.961256] RAX: 0000000000000050 RBX: 0000000000000000 RCX:
0000000000000000
[281202.963694] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff89b59b37bc00
[281202.966102] RBP: ffff8b0e63407c00 R08: 000000000000000a R09:
0000000000000000
[281202.968460] R10: 0000000000001e00 R11: ffff8b0e6340790e R12:
ffff89b59b37bc00
[281202.970831] R13: 0000000000000001 R14: 00000000000000c6 R15:
0000000000000000
[281202.973168] FS:  00007f074d5e8740(0000) GS:ffff8a9e7fc40000(0000)
knlGS:0000000000000000
[281202.975502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[281202.977814] CR2: 0000000000000010 CR3: 0000016a50f3a000 CR4:
00000000003607e0
[281202.980144] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[281202.982474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[281202.984773] Call Trace:
[281202.987156]  [<ffffffffc0cfa40b>] check_quota_exceeded+0x1bb/0x270
[ceph]
[281202.989508]  [<ffffffffc0cfa7d4>]
ceph_quota_is_max_bytes_exceeded+0x44/0x60 [ceph]
[281202.991883]  [<ffffffffc0ce2ef2>] ceph_aio_write+0x1e2/0xde0 [ceph]
[281202.994258]  [<ffffffff95c56b13>] ? lookup_fast+0xb3/0x230
[281202.996607]  [<ffffffff95b5938d>] ? call_rcu_sched+0x1d/0x20
[281202.998947]  [<ffffffff95c4d166>] ? put_filp+0x46/0x50
[281203.001236]  [<ffffffff95c49d83>] do_sync_write+0x93/0xe0
[281203.003566]  [<ffffffff95c4a870>] vfs_write+0xc0/0x1f0
[281203.005884]  [<ffffffff95c4b68f>] SyS_write+0x7f/0xf0
[281203.008152]  [<ffffffff9618dede>] system_call_fastpath+0x25/0x2a
[281203.010368] Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
89 e5 41 55 41 54 49 89 fc 53 f6 05 3a 8b 02 00 04 48 89 f3 0f 85 89 00
00 00 <f0> ff 4b 10 0f 94 c0 84 c0 75 0c 5b 41 5c 41 5d 5d c
3 0f 1f 44
[281203.015129] RIP  [<ffffffffc0cf65b1>] ceph_put_snap_realm+0x21/0xe0
[ceph]
[281203.017510]  RSP <ffff8b0e63407be8>
[281203.019743] CR2: 0000000000000010

# uname -a
Linux zeus.icbi.local 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4
23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

More dmesg extract attached.

Should I file a bug report?

Dietmar

On 2020-02-12 13:32, Dietmar Rieder wrote:
Hi,

we sometimes loose access to our cephfs mount and get permission denied
if we try to cd into it. This happens apparently only on some of our HPC
cephfs-client nodes (fs mounted via kernel client) when they are busy
with calculation and I/O.

When we then manually force unmount the fs and remount it, everything is
working again.

This is the dmesg output of the affected client node:
<https://pastebin.com/z5wxUgYS>

All HPC clients and ceph servers are running CentOS 7.7 with the same
kernel:

$ uname -a
Linux apollo-08.local 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4
23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

and all are running ceph version 14.2.7

$ ceph -v
ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus
(stable)

Maybe someone has an idea what goes wrong, and how we can fix/avoid this.

Thanks
   Dietmar

Cheers
Toby
--
Toby Darling, Scientific Computing (2N249)
MRC Laboratory of Molecular Biology
Francis Crick Avenue
Cambridge Biomedical Campus
Cambridge CB2 0QH
Phone 01223 267070
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx