Re: kvm vm cephfs mount hangs on osd node (something like umount -l available?) (help wanted going to production)

Eugen Block <eblock@xxxxxx> · Tue, 22 Dec 2020 12:49:24 +0000

Hi,

there have been several threads about hanging cephfs mounts, one quite  
long thread [1] describes a couple of debugging options but also  
mentions to avoid mounting cephfs on OSD nodes in a production  
environment.

Do you see blacklisted clients with 'ceph osd blacklist ls'? If the  
answer is yes try to unblock that client [2].
The same option ('umount -l') is available on a cephfs client, you can  
try that, too. Other options described in [1] are to execute an MDS  
failover, but sometimes a reboot of that VM is the only solution left.

Regards,
Eugen

[1]  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/028719.html
[2]  
https://docs.ceph.com/en/latest/cephfs/eviction/#advanced-un-blocklisting-a-client

Zitat von Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>:

Is there not some genius out there that can shed a ligth on this? ;)
Currently I am not able to reproduce this. Thus it would be nice to have
some procedure at hand that resolves stale cephfs mounts nicely.

-----Original Message-----
To: ceph-users
Subject:  kvm vm cephfs mount hangs on osd node (something
like umount -l available?) (help wanted going to production)

I have a vm on a osd node (which can reach host and other nodes via the
macvtap interface (used by the host and guest)). I just did a simple
bonnie++ test and everything seems to be fine. Yesterday however the
dovecot procces apparently caused problems (only using cephfs for an
archive namespace, inbox is on rbd ssd, fs meta also on ssd)

How can I recover from such lock-up. If I have a similar situation with
an nfs-ganesha mount, I have the option to do a umount -l, and clients
recover quickly without any issues.

Having to reset the vm, is not really an option. What is best way to
resolve this?

Ceph cluster: 14.2.11 (the vm has 14.2.16)

I have in my ceph.conf nothing special, these 2x in the mds section:

mds bal fragment size max = 120000
# maybe for nfs-ganesha problems?
# http://docs.ceph.com/docs/master/cephfs/eviction/
#mds_session_blacklist_on_timeout = false
#mds_session_blacklist_on_evict = false
mds_cache_memory_limit = 17179860387

All running:
CentOS Linux release 7.9.2009 (Core)
Linux mail04 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC
2020 x86_64 x86_64 x86_64 GNU/Linux
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx