Re: How to handle revocation of locking state

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 28 May 2020 22:24:46 +0000

On Thu, 2020-05-28 at 17:43 -0400, Olga Kornievskaia wrote:
> On Thu, May 28, 2020 at 5:10 PM Trond Myklebust <
> trondmy@xxxxxxxxxxxxxxx> wrote:
> > Hi Olga,
> > 
> > On Thu, 2020-05-28 at 16:42 -0400, Olga Kornievskaia wrote:
> > > Hi folks,
> > > 
> > > Looking for recommendation on what the client is suppose to be
> > > doing
> > > in the following situation. Client opens a file and has a byte-
> > > range
> > > lock which returned a locking state. Client is acquiring another
> > > byte
> > > range lock. It uses the returned locking stated for the 2nd lock.
> > > Server returns ADMIN_REVOKED.
> > > 
> > > Currently the client goes into an infinite loop of just resending
> > > the
> > > same LOCK operation with
> > > the same locking stateid.
> > > 
> > > Is this a recoverable situation? The fact that the lock state was
> > > revoked, should it be an automatic EIO since previous lock is
> > > lost
> > > (so
> > > why bother going forward)? Or should the client retry the lock
> > > but
> > > send it with the open stateid?
> > > 
> > > Thank you.
> > 
> > I think the right behaviour should be to just call
> > nfs_inode_find_state_and_recover(). In principle that will end up
> > either recovering the lock (if the user set the
> > nfs.recover_lost_locks
> > kernel parameter to 'true') or marking it as a lost lock, using
> > NFS_LOCK_LOST.
> 
> Why should acquiring of the 2nd lock depend on recovering the lock
> who's stateid it was trying to use? I think the 1st stateid is lost
> unrecoverable?

Agreed. However that means the application needs to know that it may
have corrupt data on its hands. We do know that this is the same
application that took the first lock, because any close of the file
(including due to application crashes) would result in the locks being
returned.

Some *NIX implementations have a special SIGLOST signal that their NFS
clients can use to let the application know its state was lost. Linux
unfortunately does not have such a signal, so we have to rely on error
codes.

> Right now what happens is code initiates recovery. open is sent. But
> the retry of the 2nd lock has the INITIALIZED_LOCK set and so it
> takes
> the bad lock stateid (how about instead letting it use the recovered
> open stateid?). How about instead do the follow.

NFSv4.1 requires us to call FREE_STATEID on any stateid that is
revoked, in order to let the server know when we've discovered that the
lock was lost. So we also have to go through the recovery machinery to
ensure that happens before we can deal with taking the second lock.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx