Re: cephfs/ceph-fuse: mds0: Client XXX:XXXfailingtorespondto capability release

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Wed, 14 Sep 2016 15:59:03 +0200

Hi,

My cluster is back to HEALTH_OK, the involved host has been restarted 
by the user. But I will debug some more on the host when i see this 
issue again next time.

PS: For completeness, i've stated that this issue was often seen in my 
current Jewel environment, I meant to say that this issue comes up 
sometimes (so not so often). But the times when i *do* have this 
issue, it blocks some I/O for clients as a consequence.

That's why I assume that the root cause might be a bug in ceph-fuse. 
There's support for page cache in ceph-fuse (not sure whether it is 
active by default), and afaik it has to keep the capabilities around as 
long as the corresponding file is still in the cache. If another clients 
wants to access the file, the mds might need to revoke the capabilites 
for cached files (e.g. if one client wants to overwrite a file that has 
been read by another client before). The client has to wait until it is 
able to acquire the capabilities, resulting in blocked I/O.

We had similar problems in the past with ceph-fuse, especially if page 
cache support was active. We have switched to kernel based cephfs in the 
meantime (with it's own pro and cons).

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com