Hi,
My cluster is back to HEALTH_OK, the involved host has been restarted
by the user. But I will debug some more on the host when i see this
issue again next time.
PS: For completeness, i've stated that this issue was often seen in my
current Jewel environment, I meant to say that this issue comes up
sometimes (so not so often). But the times when i *do* have this
issue, it blocks some I/O for clients as a consequence.
That's why I assume that the root cause might be a bug in ceph-fuse.
There's support for page cache in ceph-fuse (not sure whether it is
active by default), and afaik it has to keep the capabilities around as
long as the corresponding file is still in the cache. If another clients
wants to access the file, the mds might need to revoke the capabilites
for cached files (e.g. if one client wants to overwrite a file that has
been read by another client before). The client has to wait until it is
able to acquire the capabilities, resulting in blocked I/O.
We had similar problems in the past with ceph-fuse, especially if page
cache support was active. We have switched to kernel based cephfs in the
meantime (with it's own pro and cons).
Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com