Re: Another OPEN / OPEN_DOWNGRADE race

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 22 Feb 2019 04:06:44 +0000

Hi Neil

On Fri, 2019-02-22 at 11:58 +1100, NeilBrown wrote:
> Hi,
>  I have a report of an NFSv4.1 state management problem in our 4.4
>  kernel which appears to be caused by a race between OPEN and
>  OPEN_DOWNGRADE, and I don't think the race is fixed in mainline.
> 
>  The test program creates multiple threads which open a file O_RDWR
> and
>  write to it, then opens for O_RDONLY and reads from it.
>  A network trace which shows the problem suggests that at a point in
>  time where there are some O_RDONLY opens and one O_RDWR open, a new
>  O_RDWR open is requested just as the O_RDWR open is being closed.
> 
>  The close of the O_RDWR clears state->n_rdwr early so
> can_open_cached()
>  fails for the O_RDWR open, and it needs to go to the server.
>  The open for O_RDWR doesn't increment n_rdwr until after the open
>  succeeds, so nfs4_close_prepare sees
>     n_rdwr == 0
>     n_rdonly > 0
>     NFS_O_RDWR_STATE and NFS_O_RDONLY_STATE set
>  which causes it to choose an OPEN_DOWNGRADE.
> 
>  What we see is a OPEN/share-all and an OPEN_DOWNGRADE/share-read
>  request are sent one after the other without waiting for a reply.
>  The OPEN is processed first, then the OPEN_DOWNGRADE, resulting in a
>  state that only allows reads.  Then a WRITE is attempted which
> fails.
>  This enters a infinite loop with 2 steps:
>   - a WRITE gets NFS4ERR_OPENMODE
>   - a TEST_STATEID succeeds
> 
>  Once an OPEN/share-all request has been sent, it isn't really
> correct
>  to send an OPEN_DOWNGRADE/share-read request.  However the fact that
>  the OPEN has been sent isn't visible to nfs4_close_prepare().
> 
>  There is an asymmetry between open and close w.r.t. updating the
>  n_[mode] counter and setting the NFS_O_mode_STATE bits.
> 
>  For close, the counter is decremented, then the server is told, then
>  the state bits are cleared.
>  For open, the counter and state bits are both cleared after the
> server
>  is asked.
> 
>  I understand that this is probably because the OPEN could fail, and
>  incrementing a counter before we are sure of success seems
> unwise.  But
>  doing so would allow us to avoid the incorrect OPEN_DOWNGRADE.
> 
>  Any suggestions on what a good solution would look like?  Does it
> ever
>  make sense for an OPEN request to be concurrent with a CLOSE or
>  OPEN_DOWNGRADE ??  Maybe they should be serialized with each other
>  (maybe not as fully serialized as NFSv4.0, but more than they
> currently
>  are in NFSv4.1)

If the CLOSE/OPEN_DOWNGRADE is processed by the server _after_ the
OPEN, then the stateid presented by the client will have an incorrect
seqid, and so the server should reject the operation with a
NFS4ERR_OLD_STATEID.

If, on the other hand, the CLOSE/OPEN_DOWNGRADE is processed before the
OPEN, then the client should either see a new stateid with a new
'other' field (CLOSE) or it will see the same stateid but with a seqid
that does not match what it expects (OPEN_DOWNGRADE). In the current
mainline code, the client will then attempt to wait for any "missing"
seqids to get processed (see nfs_set_open_stateid_locked()).

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx