wrong stateid used after flock lock taken

NeilBrown <neilb@xxxxxxxx> · Fri, 30 Sep 2016 12:16:48 +1000

Hi Jeff et al.

I think your patch
Commit: 8003d3c4aaa5 ("nfs4: treat lock owners as opaque values")

introduced a regression ... or maybe exposed a latent problem.

The particular symptom that I can demonstrate is that if I open a file
with NFSv4, take a flock() exclusive lock, and then write to the file,
then the WRITE request uses the stateid returned by OPEN, not the one
returned by LOCK.

The Linux NFS server doesn't have a problem with that, but some NFS
servers do (one returns NFS4ERR_LOCKED, which seems to imply it imposes
mandatory locking!).
In any case, this is the wrong stateid to use.

The patch changed nfs4_copy_lock_stateid() so it was more restrictive in
the stateids it allowed.
I must admit that I find the code that you removed incredibly confusing.
I defined a union field
-               pid_t flock_owner;

and I cannot understand how a pid_t would be relevant for a flock_owner,
as the flock is tied to the 'struct file', not the pid.

Anyway, a write request includes an 'nfs_lock_context' and from that we
need to somehow find the correct stateid.
I'm wondering if nfs4_set_rw_stateid() should call
nfs4_select_rw_stateid() twice, once to look for a flock stated, and
once to look for a posix-lock stateid .... or something like that.

I'll take a fresh look at the code next week and maybe it will be easier
to understand then, but meanwhile if you have any suggestions I'd be
very happy to hear them.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature