On Mon, Sep 9, 2024 at 5:22 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Mon, 2024-09-09 at 16:06 -0700, Oleksandr Tymoshenko wrote: > > On Mon, Sep 9, 2024 at 10:56 AM Trond Myklebust > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > On Mon, 2024-09-09 at 16:36 +0000, Oleksandr Tymoshenko wrote: > > > > > > nfs41_init_clientid does not signal a failure condition from > > > > > > nfs4_proc_exchange_id and nfs4_proc_create_session to a > > > > > > client > > > > > > which > > > > > > may > > > > > > lead to mount syscall indefinitely blocked in the following > > > > > > stack > > > > > > > > > NACK. This will break all sorts of recovery scenarios, because > > > > > it > > > > > doesn't distinguish between an initial 'mount' and a server > > > > > reboot > > > > > recovery situation. > > > > > Even in the case where we are in the initial mount, it also > > > > > doesn't > > > > > distinguish between transient errors such as NFS4ERR_DELAY or > > > > > reboot > > > > > errors such as NFS4ERR_STALE_CLIENTID, etc. > > > > > > > > > Exactly what is the scenario that is causing your hang? Let's > > > > > try > > > > > to > > > > > address that with a more targeted fix. > > > > > > > > The scenario is as follows: there are several NFS servers and > > > > several > > > > production machines with multiple NFS mounts. This is a > > > > containerized > > > > multi-tennant workflow so every tennant gets its own NFS mount to > > > > access their > > > > data. At some point nfs41_init_clientid fails in the initial > > > > mount.nfs call > > > > and all subsequent mount.nfs calls just hang in > > > > nfs_wait_client_init_complete > > > > until the original one, where nfs4_proc_exchange_id has failed, > > > > is > > > > killed. > > > > > > > > The cause of the nfs41_init_clientid failure in the production > > > > case > > > > is a timeout. > > > > The following error message is observed in logs: > > > > NFS: state manager: lease expired failed on NFSv4 server <ip> > > > > with > > > > error 110 > > > > > > > > > > How about something like the following fix then? > > > 8<----------------------------------------------- > > > From eb402b489bb0d0ada1a3dd9101d4d7e193402e46 Mon Sep 17 00:00:00 > > > 2001 > > > Message-ID: > > > <eb402b489bb0d0ada1a3dd9101d4d7e193402e46.1725904471.git.trond.mykl > > > ebust@xxxxxxxxxxxxxxx> > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > Date: Mon, 9 Sep 2024 13:47:07 -0400 > > > Subject: [PATCH] NFSv4: Fail mounts if the lease setup times out > > > > > > If the server is down when the client is trying to mount, so that > > > the > > > calls to exchange_id or create_session fail, then we should allow > > > the > > > mount system call to fail rather than hang and block other > > > mount/umount > > > calls. > > > > > > Reported-by: Oleksandr Tymoshenko <ovt@xxxxxxxxxx> > > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > --- > > > fs/nfs/nfs4state.c | 6 ++++++ > > > 1 file changed, 6 insertions(+) > > > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > > index 30aba1dedaba..59dcdf9bc7b4 100644 > > > --- a/fs/nfs/nfs4state.c > > > +++ b/fs/nfs/nfs4state.c > > > @@ -2024,6 +2024,12 @@ static int > > > nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status) > > > nfs_mark_client_ready(clp, -EPERM); > > > clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state); > > > return -EPERM; > > > + case -ETIMEDOUT: > > > + if (clp->cl_cons_state == NFS_CS_SESSION_INITING) { > > > + nfs_mark_client_ready(clp, -EIO); > > > + return -EIO; > > > + } > > > + fallthrough; > > > case -EACCES: > > > case -NFS4ERR_DELAY: > > > case -EAGAIN: > > > -- > > > > This patch fixes the issue in my simulated environment. ETIMEDOUT is > > the error code that > > was observed in the production env but I guess it's not the only > > possible one. Would it make > > sense to handle all error conditions in the NFS_CS_SESSION_INITING > > state or are there > > some others that are recoverable? > > > > The only other one that I'm thinking might want to be treated similarly > is the above EACCES error. That's because that is only returned if > there is a problem with your RPCSEC_GSS/krb5 credential. I was thinking > of changing that one too in the same patch, but came to the conclusion > it would be better to treat the two issues with separate fixes. > > The other error conditions are all supposed to be transient NFS level > errors. They should not be treated as fatal. Sounds good. Will you submit this patch to the mainline kernel? If so please add me to Cc. Thanks for looking into this.