Hi, The bash script below demonstrates a problem with NFSv4 - maybe two. I'm testing on 3.16-rc2. It mounts a local filesystem via NFS. The main thread then replaces a particular file on the local file system and then accesses it via NFS, checking that the contents are correct. It does this repeatedly. In parallel a number of threads repeated open/close the same file (using 'grep' rather than 'cat' so tracing in the kernel can see which is which). With NFSv4, none of these should ever get ESTALE. The server holds the file open while the client has it open so even if it is unlinked in the local filesystem, the client will still be able to access it. Yet we do get ESTALE errors. These are caused by the may_open() call in do_last(). may_open() calls inode_permission() which calls nfs_permission() which performs an ACCESS call over NFS, which can get ESTALE. This error will stop do_last() from calling finish_open() which, in the NFSv4 case, does the final lookup and would for this test find the correct, non-stale, inode. When may_open() and then do_last() return ESTALE, do_filp_open() will call path_openat() again, this time with LOOKUP_REVAL, but that doesn't help. nfs_permission only gets the inode and so cannot d_drop the dentry or otherwise trigger a reval. I added a d_drop() to may_open() when inode_permission() returns ESTALE and the symptom went away, but I doubt that is the right thing to do. For NFSv4 I think it is really best to leave all the work to the 'open' call and not perform any access tests before hand. All the access tests should happen inside the open call once the server knows that the file is 'open'. I have no suggestion for how to fix this properly. Once that bug is fixed, the script still shows unexpected behaviour. It will eventually report that the file seen over NFS has the old content instead of the new content. This happens because the file (which has been unlinked on the server via a rename) is still open by one of the background threads and so can_open_cached() reports that "cat" doesn't need to actually open the file - it can re-use the open that the 'grep' has. This seems a little odd. It is a bit like treating an active 'open' of a file as a mini-delegation, you don't need to open it again. However if it was a real delegation, then when it was unlinked on the server the delegation would be lost. I tried running with "lookupcache=none" but I still get the same errors. That certainly seems like an error. With lookupcache=none, opening a file should check the name on the server, not assume that the name cached on the client is correct. But that isn't what happens. I have no idea how to fix this one either. I'm not even 100% sure which bit is the bug, but something definitely seems wrong. I changed can_open_cached() to always return 0 and the problem went away, but again I don't think that is a correct fix. Thanks, NeilBrown cnt=${1-10000} local=${2-/export} nfs=${3-/mnt} max_errs=${4-1} echo "using: $cnt $local $nfs $max_errs" mount -o vers=4,lookupcache=none localhost:$local $nfs || { echo mount failed ; exit 1; } rm -f $nfs/afile touch $nfs/afile for i in {1..5}; do while [ -f $nfs/afile ]; do grep . $nfs/afile > /dev/null 2>&1 ; done & done i=0 err=0 while [ $i -lt $cnt ]; do mydate=$(date +%s.%N) want="$i $mydate" echo $want > $local/bfile mv $local/bfile $local/afile have=`cat $nfs/afile` if [ "$want" == "$have" ]; then echo -n -e "$want\r" else echo fail > /dev/kmsg echo "Wanted $want have $have." for x in {1..1000}; do sleep 0.1 grep "$mydate" $nfs/afile && { echo File now correct; break;} done let err=$err+1 if [ $err -ge $max_errs ]; then echo echo $err failures after $i attempts rm $local/afile umount $nfs exit 1 fi fi let i=$i+1 done rm $local/afile echo echo $err failures in $i attempts sleep 2 umount $nfs exit 0
Attachment:
signature.asc
Description: PGP signature