Re: How to handle revocation of locking state

Olga Kornievskaia <aglo@xxxxxxxxx> · Thu, 28 May 2020 18:50:13 -0400

On Thu, May 28, 2020 at 6:24 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>
> On Thu, 2020-05-28 at 17:43 -0400, Olga Kornievskaia wrote:
> > On Thu, May 28, 2020 at 5:10 PM Trond Myklebust <
> > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > Hi Olga,
> > >
> > > On Thu, 2020-05-28 at 16:42 -0400, Olga Kornievskaia wrote:
> > > > Hi folks,
> > > >
> > > > Looking for recommendation on what the client is suppose to be
> > > > doing
> > > > in the following situation. Client opens a file and has a byte-
> > > > range
> > > > lock which returned a locking state. Client is acquiring another
> > > > byte
> > > > range lock. It uses the returned locking stated for the 2nd lock.
> > > > Server returns ADMIN_REVOKED.
> > > >
> > > > Currently the client goes into an infinite loop of just resending
> > > > the
> > > > same LOCK operation with
> > > > the same locking stateid.
> > > >
> > > > Is this a recoverable situation? The fact that the lock state was
> > > > revoked, should it be an automatic EIO since previous lock is
> > > > lost
> > > > (so
> > > > why bother going forward)? Or should the client retry the lock
> > > > but
> > > > send it with the open stateid?
> > > >
> > > > Thank you.
> > >
> > > I think the right behaviour should be to just call
> > > nfs_inode_find_state_and_recover(). In principle that will end up
> > > either recovering the lock (if the user set the
> > > nfs.recover_lost_locks
> > > kernel parameter to 'true') or marking it as a lost lock, using
> > > NFS_LOCK_LOST.
> >
> > Why should acquiring of the 2nd lock depend on recovering the lock
> > who's stateid it was trying to use? I think the 1st stateid is lost
> > unrecoverable?
>
> Agreed. However that means the application needs to know that it may
> have corrupt data on its hands. We do know that this is the same
> application that took the first lock, because any close of the file
> (including due to application crashes) would result in the locks being
> returned.
>
> Some *NIX implementations have a special SIGLOST signal that their NFS
> clients can use to let the application know its state was lost. Linux
> unfortunately does not have such a signal, so we have to rely on error
> codes.
>
> > Right now what happens is code initiates recovery. open is sent. But
> > the retry of the 2nd lock has the INITIALIZED_LOCK set and so it
> > takes
> > the bad lock stateid (how about instead letting it use the recovered
> > open stateid?). How about instead do the follow.
>
> NFSv4.1 requires us to call FREE_STATEID on any stateid that is
> revoked, in order to let the server know when we've discovered that the
> lock was lost. So we also have to go through the recovery machinery to
> ensure that happens before we can deal with taking the second lock.

Please bear with me I'm still loss:
1. If you say "application needs to know", the only outcome of this I
see is failing with EIO the 2nd lock. Which was my initial suggestion
saying if this is at all recoverable or should this be a failure
(instead of the infinite loop).
2. In nfs4_handle_setlk_error() we already call
nfs4_schedule_stateid_recovery(), I interpret this as "recovery was
initiated". I'm not sure what you envision the recovery steps are
suppose to be (or are missing. I guess the only one I see is lack of
free_stateid of the 1st lock). If you're saying the recovery includes
recovery of the 1st lock, then that's step#1 (but we don't send
free_stateid() for say a delegation stateid if it was revoked either
(or at least I don't think we do, I can test that)).  But after all
the recovery is done, the 2nd lock request needs to be re-tried and
what I see unless we change something about NFS_LOCK_INITALIZED
setting, it will once again pick a bad locking stateid.

>
> Cheers
>   Trond
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
>
>