Re: [PATCH 1/2 v2] NFSv4: Fix a livelock when CLOSE pre-emptively bumps state sequence

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Tue, 22 Sep 2020 21:15:03 +0000

On Tue, 2020-09-22 at 17:08 -0400, Benjamin Coddington wrote:
> On 22 Sep 2020, at 15:48, Trond Myklebust wrote:
> 
> > On Tue, 2020-09-22 at 15:15 -0400, Benjamin Coddington wrote:
> > > Since commit 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
> > > CLOSE/OPEN_DOWNGRADE") the following livelock may occur if a
> > > CLOSE
> > > races
> > > with the update of the nfs_state:
> > > 
> > > Process 1           Process 2           Server
> > > =========           =========           ========
> > >  OPEN file
> > >                     OPEN file
> > >                                         Reply OPEN (1)
> > >                                         Reply OPEN (2)
> > >  Update state (1)
> > >  CLOSE file (1)
> > >                                         Reply OLD_STATEID (1)
> > >  CLOSE file (2)
> > >                                         Reply CLOSE (-1)
> > >                     Update state (2)
> > >                     wait for state change
> > >  OPEN file
> > >                     wake
> > >  CLOSE file
> > >  OPEN file
> > >                     wake
> > >  CLOSE file
> > >  ...
> > >                     ...
> > > 
> > > As long as the first process continues updating state, the second
> > > process
> > > will fail to exit the loop in
> > > nfs_set_open_stateid_locked().  This
> > > livelock
> > > has been observed in generic/168.
> > > 
> > > Fix this by detecting the case in nfs_need_update_open_stateid()
> > > and
> > > then exit the loop if:
> > >  - the state is NFS_OPEN_STATE, and
> > >  - the stateid sequence is > 1, and
> > >  - the stateid doesn't match the current open stateid
> > > 
> > > Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in
> > > CLOSE/OPEN_DOWNGRADE")
> > > Cc: stable@xxxxxxxxxxxxxxx # v5.4+
> > > Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
> > > ---
> > >  fs/nfs/nfs4proc.c | 27 +++++++++++++++++----------
> > >  1 file changed, 17 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > index 6e95c85fe395..9db29a438540 100644
> > > --- a/fs/nfs/nfs4proc.c
> > > +++ b/fs/nfs/nfs4proc.c
> > > @@ -1588,18 +1588,25 @@ static void
> > > nfs_test_and_clear_all_open_stateid(struct nfs4_state *state)
> > >  static bool nfs_need_update_open_stateid(struct nfs4_state
> > > *state,
> > >  		const nfs4_stateid *stateid)
> > >  {
> > > -	if (test_bit(NFS_OPEN_STATE, &state->flags) == 0 ||
> > > -	    !nfs4_stateid_match_other(stateid, &state->open_stateid)) {
> > > -		if (stateid->seqid == cpu_to_be32(1))
> > > +	bool state_matches_other = nfs4_stateid_match_other(stateid,
> > > &state->open_stateid);
> > > +	bool seqid_one = stateid->seqid == cpu_to_be32(1);
> > > +
> > > +	if (test_bit(NFS_OPEN_STATE, &state->flags)) {
> > > +		/* The common case - we're updating to a new sequence
> > > number */
> > > +		if (state_matches_other &&
> > > nfs4_stateid_is_newer(stateid, &state->open_stateid)) {
> > > +			nfs_state_log_out_of_order_open_stateid(state,
> > > stateid);
> > > +			return true;
> > > +		}
> > > +	} else {
> > > +		/* This is the first OPEN */
> > > +		if (!state_matches_other && seqid_one) {
> > 
> > Why are we looking at the value of state_matches_other here? If the
> > NFS_OPEN_STATE flags is not set, then the state->open_stateid is
> > not
> > initialised, so how does it make sense to look at a comparison
> > with 
> > the
> > incoming stateid?
> 
> You're right - it doesn't make sense. I failed to clean it up out of
> my
> decision matrix.  I'll send a v3 after it gets tested overnight.
> 
> Thanks for the look, and if you have another moment - what do you
> think
> about not bumping the seqid in the CLOSE/OPEN_DOWNGRADE path on 
> OLD_STATEID
> if the state's refcount > 1?  This would optimize away a lot of
> recovery 
> for
> races like this, but I wonder if it would be possible to end up with
> two
> OPEN_DOWNGRADEs dueling that would never recover.
> 

It would lead to a return of the infinite loop in cases where the
client misses a reply to an OPEN or CLOSE due to a soft timeout (which
is an important use case for us).

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx