Dear Cephalopodians, just now that our Ceph cluster is under high I/O load, we get user reports of files not being seen on some clients, but somehow showing up after forcing a stat() syscall. For example, one user had added several files to a directory via an NFS client attached to nfs-ganesha (which uses libcephfs), and afterwards, all other nfs-ganesha servers saw it, and 44 of our Fuse-clients - but one single client still saw the old contents of the directory, i.e. the files seemed missing(!). This happened both when using "ls" on the directory or when trying to access the non-existent files directly. I could confirm this observation also in a fresh login shell on the machine. Then, on the "broken" client, I entered in the directory which seemed to contain only the "old" content, and I created a new file in there. This worked fine, and all other clients saw the file immediately. Also on the broken client, metadata was now updated and all other files appeared - i.e. everything was "in sync" again. There's nothing in the ceph-logs of our MDS, or in the syslogs of the client machine / MDS. Another user observed the same, but not explicitly limited to one machine (it seems random). He now uses a "stat" on the file he expects to exist (but which is not seen with "ls"). The stat returns "No such file", a subsequent "ls" then however lists the file, and it can be accessed normally. This feels like something is messed up concerning the client caps - these are all 12.2.4 Fuse clients. Any ideas how to find the cause? It only happens since recently, and under high I/O load with many metadata operations. Cheers, Oliver
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com