Re: Another OPEN / OPEN_DOWNGRADE race

NeilBrown <neilb@xxxxxxxx> · Sat, 23 Feb 2019 12:22:24 +1100

On Fri, Feb 22 2019, Trond Myklebust wrote:

> Hi Neil,
>
> On Fri, 2019-02-22 at 16:02 +1100, NeilBrown wrote:
>> On Fri, Feb 22 2019, Trond Myklebust wrote:
>> 
>> > Hi Neil
>> > 
>> > On Fri, 2019-02-22 at 11:58 +1100, NeilBrown wrote:
>> > > Hi,
>> > >  I have a report of an NFSv4.1 state management problem in our
>> > > 4.4
>> > >  kernel which appears to be caused by a race between OPEN and
>> > >  OPEN_DOWNGRADE, and I don't think the race is fixed in mainline.
>> > > 
>> > >  The test program creates multiple threads which open a file
>> > > O_RDWR
>> > > and
>> > >  write to it, then opens for O_RDONLY and reads from it.
>> > >  A network trace which shows the problem suggests that at a point
>> > > in
>> > >  time where there are some O_RDONLY opens and one O_RDWR open, a
>> > > new
>> > >  O_RDWR open is requested just as the O_RDWR open is being
>> > > closed.
>> > > 
>> > >  The close of the O_RDWR clears state->n_rdwr early so
>> > > can_open_cached()
>> > >  fails for the O_RDWR open, and it needs to go to the server.
>> > >  The open for O_RDWR doesn't increment n_rdwr until after the
>> > > open
>> > >  succeeds, so nfs4_close_prepare sees
>> > >     n_rdwr == 0
>> > >     n_rdonly > 0
>> > >     NFS_O_RDWR_STATE and NFS_O_RDONLY_STATE set
>> > >  which causes it to choose an OPEN_DOWNGRADE.
>> > > 
>> > >  What we see is a OPEN/share-all and an OPEN_DOWNGRADE/share-read
>> > >  request are sent one after the other without waiting for a
>> > > reply.
>> > >  The OPEN is processed first, then the OPEN_DOWNGRADE, resulting
>> > > in a
>> > >  state that only allows reads.  Then a WRITE is attempted which
>> > > fails.
>> > >  This enters a infinite loop with 2 steps:
>> > >   - a WRITE gets NFS4ERR_OPENMODE
>> > >   - a TEST_STATEID succeeds
>> > > 
>> > >  Once an OPEN/share-all request has been sent, it isn't really
>> > > correct
>> > >  to send an OPEN_DOWNGRADE/share-read request.  However the fact
>> > > that
>> > >  the OPEN has been sent isn't visible to nfs4_close_prepare().
>> > > 
>> > >  There is an asymmetry between open and close w.r.t. updating the
>> > >  n_[mode] counter and setting the NFS_O_mode_STATE bits.
>> > > 
>> > >  For close, the counter is decremented, then the server is told,
>> > > then
>> > >  the state bits are cleared.
>> > >  For open, the counter and state bits are both cleared after the
>> > > server
>> > >  is asked.
>> > > 
>> > >  I understand that this is probably because the OPEN could fail,
>> > > and
>> > >  incrementing a counter before we are sure of success seems
>> > > unwise.  But
>> > >  doing so would allow us to avoid the incorrect OPEN_DOWNGRADE.
>> > > 
>> > >  Any suggestions on what a good solution would look like?  Does
>> > > it
>> > > ever
>> > >  make sense for an OPEN request to be concurrent with a CLOSE or
>> > >  OPEN_DOWNGRADE ??  Maybe they should be serialized with each
>> > > other
>> > >  (maybe not as fully serialized as NFSv4.0, but more than they
>> > > currently
>> > >  are in NFSv4.1)
>> 
>> Hi Trond,
>>  thanks for the reply.
>> 
>> > If the CLOSE/OPEN_DOWNGRADE is processed by the server _after_ the
>> > OPEN, then the stateid presented by the client will have an
>> > incorrect
>> > seqid, and so the server should reject the operation with a
>> > NFS4ERR_OLD_STATEID.
>> 
>> According to RFC5661, section  18.18.3, the DESCRIPTION of
>> OPEN_DOWNGRADE,
>> 
>>     The seqid argument is not used in NFSv4.1, MAY be any value, and
>>     MUST be ignored by the server.
>> 
>> So the fact that the stateid passed (3) is less than the stateids
>> already returned for some OPEN operations (4 and 5), the server MUST
>> still process the OPEN_DOWNGRADE, not give an error.
>
> That refers to the explicit "seqid" parameter in struct
> OPEN_DOWNGRADE4args. I'm referring to the seqid that constitutes the
> first 4 bytes of the stateid.
>
> In other words, I'm referring to the mechanism described here: 
> https://tools.ietf.org/html/rfc5661#section-8.2.2
>
> Note that the Linux client doesn't ever set the stateid seqid to zero
> on CLOSE or OPEN_DOWNGRADE precisely because we rely on the
> NFS4ERR_OLD_STATEID mechanism to detect races with OPEN.

Ahhh, I had missed that there were two seqids in OPEN_DOWNGRADE, on in
the stateid, one not.
Makes sense now.

With that perspective, it is pretty clear that the server is
misbehaving.
The OPEN_DOWNGRADE presents a seqid of 3 and after returning two open
requests with seqids of 4 and 5, the server processes that
OPEN_DOWNGRADE without error and returns a seqid of 6.

Thanks a lot,

NeilBrown

>
>> 
>> > If, on the other hand, the CLOSE/OPEN_DOWNGRADE is processed before
>> > the
>> > OPEN, then the client should either see a new stateid with a new
>> > 'other' field (CLOSE) or it will see the same stateid but with a
>> > seqid
>> > that does not match what it expects (OPEN_DOWNGRADE). In the
>> > current
>> > mainline code, the client will then attempt to wait for any
>> > "missing"
>> > seqids to get processed (see nfs_set_open_stateid_locked()).
>> 
>> Hmm.. I don't have the waiting in my 4.4 code - I should probably
>> make
>> sure I understand that.  However the tcpdump trace shows that this
>> scenario isn't what is happening, so it isn't immediately relevant.
>> 
>
> True. The waiting mechanism was introduced by commit c9399f21c215
> ("NFSv4: Fix OPEN / CLOSE race") in v4.15. We should probably have
> kicked that down the stable patch mechanism.
> Cheers
>   Trond
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
Attachment:
signature.asc

Description: PGP signature