Re: [PATCH] nfsd: CB_RECALL can race with FREE_STATEID

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2019-04-18 at 16:50 -0400, Scott Mayhew wrote:
> On Thu, 18 Apr 2019, J. Bruce Fields wrote:
> 
> > On Thu, Apr 18, 2019 at 09:24:00AM -0400, Scott Mayhew wrote:
> > > While trying to track down some issues involving large numbers of
> > > delegations being recalled/revoked, I caught the server setting
> > > SEQ4_STATUS_CB_PATH_DOWN while the client was actively responding
> > > to
> > > CB_RECALLs.  It turns out that the client had already done a
> > > TEST_STATEID and FREE_STATEID for a delegation being recalled by
> > > the
> > > time it received the CB_RECALL.
> > 
> > That's interesting, thanks!
> > 
> > This exception seems awfully narrow, though.
> > 
> > If we get back any NFS-level error at all, then I think the
> > callback
> > channel is working (am I wrong?)
> 
> Correct, if the client replies with either NFS4ERR_DELAY or
> NFS4ERR_BAD_STATEID, the server will retry 1 time (see dl_retries).
> After that, we fall thru and nfsd4_cb_recall_done() returns -1 which
> causes the SEQ4_STATUS_CB_PATH_DOWN flag to be set.

There is no handling of NFS4ERR_DELAY in nfsd4_cb_recall_done().

As far as I can see, therefore, if the client returns NFS4ERR_DELAY
(which it usually does if it is already in the process of returning the
delegation) then the recall will fail immediately.

> > and telling the client to set up a new
> > one is probably not going to help.  The best we can do is probably
> > just
> > give up
> 
> That's what the patch is essentially doing.  Or are you saying don't
> even bother with the checks but still return 1 so we don't set the
> SEQ4_STATUS_CB_PATH_DOWN flag?
> 
> > and let the client deal with the ensuing
> > RECALLABLE_STATE_REVOKED flag.
> 
> The client's already dealing with the RECALLABLE_STATE_REVOKED flag,
> that's why it sent a TEST_STATEID and FREE_STATEID before it got this
> particular CB_RECALL.  The idea behind the patch is to not give the
> state manager on the client additional work by setting CB_PATH_DOWN
> when
> the callback channel is clearly working...
> 

Either way, the Linux client will ignore any further sequence flags
until it is done with the recovery of the RECALLABLE_STATE_REVOKED
flag. The reason is that the flags are edge triggered (i.e. they don't
clear until the state changes), and so we need to be able to perform a
full recovery before we can check them again.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux