On 08/07/11 22:09, J. Bruce Fields wrote:
With default mount options, the linux NFS client (like most NFS clients)
assumes that a file has a most one writer at a time. (Applications that
need to do write-sharing over NFS need to use file locking.)
The problem is that file locking on V3 isn't passed back down to the
filesystem - hence the issues with nfs vs samba (or local disk
access(*)) on the same server.
(*) Local disk access includes anything running on other nodes in a
GFS/GFS2 environment. This precludes exporting the same GFS(2)
filesystem on multiple cluster nodes.
The NFS protocol supports higher granularity timestamps. The limitation
is the exported filesystem. If you're using something other than
ext2/3, you're probably getting higher granularity.
GFS/GFS2 in this case...
can (and has)
result in writes made by non-nfs processes to cause NFS clients which have
that file opened read/write to see "stale filehandle" errors due to the
inode having changed when they weren't expecting it.
Changing file data or attributes won't result in stale filehandle
errors. (Bug reports welcome if you've seen otherwise.)
I'll have to try and repeat the issue, but it's a race condition with a
narrow window at the best of times.
Stale
filehandle errors should only happen when a client attempts to use a
file which no longer exists on the server. (E.g. if another client
deletes a file while your client has it open.)
It's possible this has happened. I have no idea what user batch scripts
are trying to do on the compute nodes, but in the case that was brought
to my attention the file was edited on one node while another had it open.
(This can also happen if
you rename a file across directories on a filesystem exported with the
subtree_check option. The subtree_check option is deprecated, for that
reason.)
All our FSes are exported no_subtree_check and at the root of the FS.
We (should) all know NFS was a kludge. What's surprising is how much
kludge stll remains in the current v2/3 code (which is surprisingly opaque
and incredibly crufty, much of it dates from the early 1990s or earlier)
Details welcome.
The non-parallelisation in exportfs (leading to race conditions) for
starters. We had to insert flock statements in every call to it in
/usr/share/cluster/nfsclient.sh in order to have reliable service startups
There are a number of RH Bugzilla tickets revolving around NFS behaviour
which would be worth looking at.
As I said earlier, V4 is supposed to play a lot nicer
V4 has a number of improvements, but what I've described above applies
across versions (module some technical details about timestamps vs.
change attributes).
Thanks for the input.
NFS has been a major pain point in our organisation for years. If you
have ideas for doing things better then I'm very interested.
Alan
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster