Just got this during bonnie test, trying to do an ls -l on the cephfs. I also have this kworker process constantly at 40% when doing this bonnie++ test. [35281.101763] INFO: task bash:1169 blocked for more than 120 seconds. [35281.102064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [35281.102175] bash D ffffa03fbfc9acc0 0 1169 1167 0x00000004 [35281.102181] Call Trace: [35281.102275] [<ffffffff84b86d4f>] ? __schedule+0x3af/0x860 [35281.102285] [<ffffffff84b87229>] schedule+0x29/0x70 [35281.102296] [<ffffffff84b84d11>] schedule_timeout+0x221/0x2d0 [35281.102332] [<ffffffff844c6966>] ? finish_wait+0x56/0x70 [35281.102342] [<ffffffff84b85482>] ? mutex_lock+0x12/0x2f [35281.102381] [<ffffffff846e7ed8>] ? autofs4_wait+0x428/0x920 [35281.102386] [<ffffffff84b875dd>] wait_for_completion+0xfd/0x140 [35281.102407] [<ffffffff844daf40>] ? wake_up_state+0x20/0x20 [35281.102422] [<ffffffff846e902b>] autofs4_expire_wait+0xab/0x160 [35281.102425] [<ffffffff846e6060>] do_expire_wait+0x1e0/0x210 [35281.102429] [<ffffffff846e62b3>] autofs4_d_manage+0x73/0x1c0 [35281.102455] [<ffffffff84658e8a>] follow_managed+0xba/0x310 [35281.102459] [<ffffffff84659e5d>] lookup_fast+0x12d/0x230 [35281.102464] [<ffffffff8465c90d>] path_lookupat+0x16d/0x8d0 [35281.102467] [<ffffffff8465deed>] ? do_last+0x66d/0x1340 [35281.102488] [<ffffffff8464a73a>] ? __check_object_size+0x1ca/0x250 [35281.102499] [<ffffffff84628675>] ? kmem_cache_alloc+0x35/0x1f0 [35281.102503] [<ffffffff8465fc0f>] ? getname_flags+0x4f/0x1a0 [35281.102507] [<ffffffff8465d09b>] filename_lookup+0x2b/0xc0 [35281.102510] [<ffffffff84660da7>] user_path_at_empty+0x67/0xc0 [35281.102513] [<ffffffff84660e11>] user_path_at+0x11/0x20 [35281.102516] [<ffffffff84653603>] vfs_fstatat+0x63/0xc0 [35281.102519] [<ffffffff846539be>] SYSC_newstat+0x2e/0x60 [35281.102529] [<ffffffff84b94ed5>] ? system_call_after_swapgs+0xa2/0x13a [35281.102533] [<ffffffff84b94ec9>] ? system_call_after_swapgs+0x96/0x13a [35281.102536] [<ffffffff84b94ed5>] ? system_call_after_swapgs+0xa2/0x13a [35281.102539] [<ffffffff84b94ec9>] ? system_call_after_swapgs+0x96/0x13a [35281.102543] [<ffffffff84b94ed5>] ? system_call_after_swapgs+0xa2/0x13a [35281.102546] [<ffffffff84b94ec9>] ? system_call_after_swapgs+0x96/0x13a [35281.102549] [<ffffffff84b94ed5>] ? system_call_after_swapgs+0xa2/0x13a [35281.102552] [<ffffffff84b94ec9>] ? system_call_after_swapgs+0x96/0x13a [35281.102555] [<ffffffff84b94ed5>] ? system_call_after_swapgs+0xa2/0x13a [35281.102558] [<ffffffff84b94ec9>] ? system_call_after_swapgs+0x96/0x13a [35281.102561] [<ffffffff84b94ed5>] ? system_call_after_swapgs+0xa2/0x13a [35281.102565] [<ffffffff84653e7e>] SyS_newstat+0xe/0x10 [35281.102568] [<ffffffff84b94f92>] system_call_fastpath+0x25/0x2a [35281.102572] [<ffffffff84b94ed5>] ? system_call_after_swapgs+0xa2/0x13a -----Original Message----- To: ceph-users Subject: kvm vm cephfs mount hangs on osd node (something like umount -l available?) (help wanted going to production) I have a vm on a osd node (which can reach host and other nodes via the macvtap interface (used by the host and guest)). I just did a simple bonnie++ test and everything seems to be fine. Yesterday however the dovecot procces apparently caused problems (only using cephfs for an archive namespace, inbox is on rbd ssd, fs meta also on ssd) How can I recover from such lock-up. If I have a similar situation with an nfs-ganesha mount, I have the option to do a umount -l, and clients recover quickly without any issues. Having to reset the vm, is not really an option. What is best way to resolve this? Ceph cluster: 14.2.11 (the vm has 14.2.16) I have in my ceph.conf nothing special, these 2x in the mds section: mds bal fragment size max = 120000 # maybe for nfs-ganesha problems? # http://docs.ceph.com/docs/master/cephfs/eviction/ #mds_session_blacklist_on_timeout = false #mds_session_blacklist_on_evict = false mds_cache_memory_limit = 17179860387 All running: CentOS Linux release 7.9.2009 (Core) Linux mail04 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx