On Thu, 2020-05-28 at 18:50 -0400, Olga Kornievskaia wrote: > On Thu, May 28, 2020 at 6:24 PM Trond Myklebust < > trondmy@xxxxxxxxxxxxxxx> wrote: > > On Thu, 2020-05-28 at 17:43 -0400, Olga Kornievskaia wrote: > > > On Thu, May 28, 2020 at 5:10 PM Trond Myklebust < > > > trondmy@xxxxxxxxxxxxxxx> wrote: > > > > Hi Olga, > > > > > > > > On Thu, 2020-05-28 at 16:42 -0400, Olga Kornievskaia wrote: > > > > > Hi folks, > > > > > > > > > > Looking for recommendation on what the client is suppose to > > > > > be > > > > > doing > > > > > in the following situation. Client opens a file and has a > > > > > byte- > > > > > range > > > > > lock which returned a locking state. Client is acquiring > > > > > another > > > > > byte > > > > > range lock. It uses the returned locking stated for the 2nd > > > > > lock. > > > > > Server returns ADMIN_REVOKED. > > > > > > > > > > Currently the client goes into an infinite loop of just > > > > > resending > > > > > the > > > > > same LOCK operation with > > > > > the same locking stateid. > > > > > > > > > > Is this a recoverable situation? The fact that the lock state > > > > > was > > > > > revoked, should it be an automatic EIO since previous lock is > > > > > lost > > > > > (so > > > > > why bother going forward)? Or should the client retry the > > > > > lock > > > > > but > > > > > send it with the open stateid? > > > > > > > > > > Thank you. > > > > > > > > I think the right behaviour should be to just call > > > > nfs_inode_find_state_and_recover(). In principle that will end > > > > up > > > > either recovering the lock (if the user set the > > > > nfs.recover_lost_locks > > > > kernel parameter to 'true') or marking it as a lost lock, using > > > > NFS_LOCK_LOST. > > > > > > Why should acquiring of the 2nd lock depend on recovering the > > > lock > > > who's stateid it was trying to use? I think the 1st stateid is > > > lost > > > unrecoverable? > > > > Agreed. However that means the application needs to know that it > > may > > have corrupt data on its hands. We do know that this is the same > > application that took the first lock, because any close of the file > > (including due to application crashes) would result in the locks > > being > > returned. > > > > Some *NIX implementations have a special SIGLOST signal that their > > NFS > > clients can use to let the application know its state was lost. > > Linux > > unfortunately does not have such a signal, so we have to rely on > > error > > codes. > > > > > Right now what happens is code initiates recovery. open is sent. > > > But > > > the retry of the 2nd lock has the INITIALIZED_LOCK set and so it > > > takes > > > the bad lock stateid (how about instead letting it use the > > > recovered > > > open stateid?). How about instead do the follow. > > > > NFSv4.1 requires us to call FREE_STATEID on any stateid that is > > revoked, in order to let the server know when we've discovered that > > the > > lock was lost. So we also have to go through the recovery machinery > > to > > ensure that happens before we can deal with taking the second lock. > > Please bear with me I'm still loss: > 1. If you say "application needs to know", the only outcome of this I > see is failing with EIO the 2nd lock. Which was my initial suggestion > saying if this is at all recoverable or should this be a failure > (instead of the infinite loop). It should be a failure unless the nfs.recover_lost_locks kernel parameter is set, in which case it should silently recover the lost lock. The default behaviour should therefore be failure. > 2. In nfs4_handle_setlk_error() we already call > nfs4_schedule_stateid_recovery(), I interpret this as "recovery was > initiated". I'm not sure what you envision the recovery steps are > suppose to be (or are missing. I guess the only one I see is lack of > free_stateid of the 1st lock). If you're saying the recovery includes > recovery of the 1st lock, then that's step#1 (but we don't send > free_stateid() for say a delegation stateid if it was revoked either > (or at least I don't think we do, I can test that)). But after all > the recovery is done, the 2nd lock request needs to be re-tried and > what I see unless we change something about NFS_LOCK_INITALIZED > setting, it will once again pick a bad locking stateid. > Recovery of the lost lock only happens in the non-default case described previously. However whether or not we recover the lock, we need to call FREE_STATEID on the stateid that is returning NFS4ERR_ADMIN_REVOKED (or NFS4ERR_DELEG_REVOKED if the stateid is a delegation). After the call to FREE_STATEID, the server will start returning NFS4ERR_BAD_STATEID if we ever present the revoked lock stateid again. So yes, once we're done sending FREE_STATEID, we can clear NFS_LOCK_INITIALIZED and start new locking requests from scratch. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx