Re: kvm vm cephfs mount hangs on osd node (something like umount -l available?) (help wanted going to production)

Eugen Block <eblock@xxxxxx> · Mon, 04 Jan 2021 07:52:11 +0000

Hi Marc,

I really would like to use the hosts because they have each
16c/32t and an average load of just 2-3.

keep in mind that the load is only low during normal operations and a  
healthy cluster. But when OSDs fail (or maybe even an entire host) the  
load increases and can have a huge impact on the running VMs. Such a  
setup circumvents the high availability for your clients. I'm sure you  
know that but I wanted to emphasize it.

Regards,
Eugen

Zitat von Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>:

Hi Eugen,

Indeed some really useful tips explaining what goes wrong, yet this
thread[1] is about cephfs directly mounted on the osd node. I was having
this also quite some time without any problems until suddenly I ran into
the same issue as they had. I think I did not have any issues with the
kernel client cephfs mount with luminous until I enabled the cephfs
snapshots. Then I had to switch to the fuse client.

In my case I am running a vm on the the osd node, which I thought would
be different. I have been able to reproduce this stale mount just 2
times now. I have been testing with 10x more clients and it still works.
Anyway I decided to move everything to rbd. I have been running vm's
with rbd images without problems colocated on osd nodes for quite some
time. I really would like to use the hosts because they have each
16c/32t and an average load of just 2-3.

Unfortunately I did not document precisely how I recovered from the
stale mount. I would like to see if I can reduce the amount of steps to
take. Things started happening for me after I did the mds failover. Then
I got blocked clients that I could unblock and fix the mount with mount
-l

Thanks for the pointers, I have linked them in my docs ;)

-----Original Message-----
To: ceph-users@xxxxxxx
Subject:  Re: kvm vm cephfs mount hangs on osd node
(something like umount -l available?) (help wanted going to production)

Hi,

there have been several threads about hanging cephfs mounts, one quite
long thread [1] describes a couple of debugging options but also
mentions to avoid mounting cephfs on OSD nodes in a production
environment.

Do you see blacklisted clients with 'ceph osd blacklist ls'? If the
answer is yes try to unblock that client [2].
The same option ('umount -l') is available on a cephfs client, you can
try that, too. Other options described in [1] are to execute an MDS
failover, but sometimes a reboot of that VM is the only solution left.

Regards,
Eugen

[1]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/028719.html
[2]
https://docs.ceph.com/en/latest/cephfs/eviction/#advanced-un-blocklisting-a-client

Zitat von Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>:

Is there not some genius out there that can shed a ligth on this? ;)
Currently I am not able to reproduce this. Thus it would be nice to
have some procedure at hand that resolves stale cephfs mounts nicely.

-----Original Message-----
To: ceph-users
Subject:  kvm vm cephfs mount hangs on osd node (something

like umount -l available?) (help wanted going to production)

I have a vm on a osd node (which can reach host and other nodes via
the macvtap interface (used by the host and guest)). I just did a
simple
bonnie++ test and everything seems to be fine. Yesterday however the
dovecot procces apparently caused problems (only using cephfs for an
archive namespace, inbox is on rbd ssd, fs meta also on ssd)

How can I recover from such lock-up. If I have a similar situation
with an nfs-ganesha mount, I have the option to do a umount -l, and
clients recover quickly without any issues.

Having to reset the vm, is not really an option. What is best way to
resolve this?

Ceph cluster: 14.2.11 (the vm has 14.2.16)

I have in my ceph.conf nothing special, these 2x in the mds section:

mds bal fragment size max = 120000
# maybe for nfs-ganesha problems?
# http://docs.ceph.com/docs/master/cephfs/eviction/
#mds_session_blacklist_on_timeout = false
#mds_session_blacklist_on_evict = false mds_cache_memory_limit =
17179860387

All running:
CentOS Linux release 7.9.2009 (Core)
Linux mail04 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC

2020 x86_64 x86_64 x86_64 GNU/Linux
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx