On Thu, May 28, 2020 at 6:24 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Thu, 2020-05-28 at 17:43 -0400, Olga Kornievskaia wrote: > > On Thu, May 28, 2020 at 5:10 PM Trond Myklebust < > > trondmy@xxxxxxxxxxxxxxx> wrote: > > > Hi Olga, > > > > > > On Thu, 2020-05-28 at 16:42 -0400, Olga Kornievskaia wrote: > > > > Hi folks, > > > > > > > > Looking for recommendation on what the client is suppose to be > > > > doing > > > > in the following situation. Client opens a file and has a byte- > > > > range > > > > lock which returned a locking state. Client is acquiring another > > > > byte > > > > range lock. It uses the returned locking stated for the 2nd lock. > > > > Server returns ADMIN_REVOKED. > > > > > > > > Currently the client goes into an infinite loop of just resending > > > > the > > > > same LOCK operation with > > > > the same locking stateid. > > > > > > > > Is this a recoverable situation? The fact that the lock state was > > > > revoked, should it be an automatic EIO since previous lock is > > > > lost > > > > (so > > > > why bother going forward)? Or should the client retry the lock > > > > but > > > > send it with the open stateid? > > > > > > > > Thank you. > > > > > > I think the right behaviour should be to just call > > > nfs_inode_find_state_and_recover(). In principle that will end up > > > either recovering the lock (if the user set the > > > nfs.recover_lost_locks > > > kernel parameter to 'true') or marking it as a lost lock, using > > > NFS_LOCK_LOST. > > > > Why should acquiring of the 2nd lock depend on recovering the lock > > who's stateid it was trying to use? I think the 1st stateid is lost > > unrecoverable? > > Agreed. However that means the application needs to know that it may > have corrupt data on its hands. We do know that this is the same > application that took the first lock, because any close of the file > (including due to application crashes) would result in the locks being > returned. > > Some *NIX implementations have a special SIGLOST signal that their NFS > clients can use to let the application know its state was lost. Linux > unfortunately does not have such a signal, so we have to rely on error > codes. > > > Right now what happens is code initiates recovery. open is sent. But > > the retry of the 2nd lock has the INITIALIZED_LOCK set and so it > > takes > > the bad lock stateid (how about instead letting it use the recovered > > open stateid?). How about instead do the follow. > > NFSv4.1 requires us to call FREE_STATEID on any stateid that is > revoked, in order to let the server know when we've discovered that the > lock was lost. So we also have to go through the recovery machinery to > ensure that happens before we can deal with taking the second lock. Please bear with me I'm still loss: 1. If you say "application needs to know", the only outcome of this I see is failing with EIO the 2nd lock. Which was my initial suggestion saying if this is at all recoverable or should this be a failure (instead of the infinite loop). 2. In nfs4_handle_setlk_error() we already call nfs4_schedule_stateid_recovery(), I interpret this as "recovery was initiated". I'm not sure what you envision the recovery steps are suppose to be (or are missing. I guess the only one I see is lack of free_stateid of the 1st lock). If you're saying the recovery includes recovery of the 1st lock, then that's step#1 (but we don't send free_stateid() for say a delegation stateid if it was revoked either (or at least I don't think we do, I can test that)). But after all the recovery is done, the 2nd lock request needs to be re-tried and what I see unless we change something about NFS_LOCK_INITALIZED setting, it will once again pick a bad locking stateid. > > Cheers > Trond > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > >