NFSv4 I/O error when reading a file which was deleted and recreated by another client

Sascha Frey <sfrey@xxxxxxxxxxxxxxxxxxxxxxxx> · Tue, 15 Mar 2016 10:21:04 +0100

Hi list,

we're experiencing a serious NFSv4 client caching issue when a
client reads a file which was deleted and recreated by another client.

Steps to reproduce the problem:

root@client1:~# mount -t nfs -o rw,vers=4,sec=sys,hard,intr 129.70.150.53:/vol/testvol5 /mnt
root@client2:~# mount -t nfs -o rw,vers=4,sec=sys,hard,intr 129.70.150.53:/vol/testvol5 /mnt

user@client1:~$ echo foo > /mnt/bar
user@client2:~$ cat /mnt/bar
foo
user@client1:~$ rm /mnt/bar ; echo quux > /mnt/bar
user@client2:~$ cat /mnt/bar
cat: /mnt/bar: Input/output error

Even after waiting some hours the i/o error is still there.
'ls' on target directory fixes the problem immediately:

user@client2:~$ ls /mnt
bar
user@client2:~$ cat /mnt/bar
quux

Dropping the inode cache also works:
root@client2:~# sync
root@client2:~# echo 2 > /proc/sys/vm/drop_caches

We tried different mount options (lookupcache=none, noac, ...),
but nothing helped.

NFS server: EMC ISILON NAS cluster

Clients: Ubuntu 14.04 LTS
Kernels tried: Ubuntu linux-image-3.16.0-40-generic, vanilla 3.18.1,
vanilla 4.4.0

Also affected:
- Debian Jessie (kernel 3.16)
- Ubuntu 16.04 beta/alpha (Ubuntu kernel 4.4.0-12-generic)

No affected:
- CentOS 6 (kernel 2.6.32)
- Debian Wheezy (kernel 3.2)

This problem does only occur when using NFS protocol version 4,
not with vers=3.

I dumped the network traffic between NFS server and client2
(I attached the dump).

Does anybody know what's happening and how to fix this issue?

Cheers,
Sascha
Attachment:
tcpdump-client2.cap.gz

Description: application/gzip