Hi Dan, I don't have a solution to the problem, I can only second that we've also been seeing strange problems when more than one node accesses the same file in ceph and at least one of them opens it for writing. I've tried verbose logging on the client (fuse), and it seems that the fuse client sends some cap request to the MDS and does not get a response sometimes. And it looks like it has some 5 second polling interval, and that sometimes (but not always) saves the day and the client continues with a 5 second-ish delay. This does not happen when multiple processes open the file for reading, but it does when processes open it for writing (even if they never write to the file and only read afterwards). I have some earlier mailing list messages from a week or two ago describing what we see more in detail (including log outputs). I think the issue has in some way to do with cap requests being lost/miscommunicated between the client and the MDS. Andras On 04/13/2017 01:41 PM, Dan van der
Ster wrote:
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com