Re: [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client

Oleksandr Tymoshenko <ovt@xxxxxxxxxx> · Tue, 10 Sep 2024 14:08:25 -0700

On Mon, Sep 9, 2024 at 5:22 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, 2024-09-09 at 16:06 -0700, Oleksandr Tymoshenko wrote:
> > On Mon, Sep 9, 2024 at 10:56 AM Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, 2024-09-09 at 16:36 +0000, Oleksandr Tymoshenko wrote:
> > > > > > nfs41_init_clientid does not signal a failure condition from
> > > > > > nfs4_proc_exchange_id and nfs4_proc_create_session to a
> > > > > > client
> > > > > > which
> > > > > > may
> > > > > > lead to mount syscall indefinitely blocked in the following
> > > > > > stack
> > > >
> > > > > NACK. This will break all sorts of recovery scenarios, because
> > > > > it
> > > > > doesn't distinguish between an initial 'mount' and a server
> > > > > reboot
> > > > > recovery situation.
> > > > > Even in the case where we are in the initial mount, it also
> > > > > doesn't
> > > > > distinguish between transient errors such as NFS4ERR_DELAY or
> > > > > reboot
> > > > > errors such as NFS4ERR_STALE_CLIENTID, etc.
> > > >
> > > > > Exactly what is the scenario that is causing your hang? Let's
> > > > > try
> > > > > to
> > > > > address that with a more targeted fix.
> > > >
> > > > The scenario is as follows: there are several NFS servers and
> > > > several
> > > > production machines with multiple NFS mounts. This is a
> > > > containerized
> > > > multi-tennant workflow so every tennant gets its own NFS mount to
> > > > access their
> > > > data. At some point nfs41_init_clientid fails in the initial
> > > > mount.nfs call
> > > > and all subsequent mount.nfs calls just hang in
> > > > nfs_wait_client_init_complete
> > > > until the original one, where nfs4_proc_exchange_id has failed,
> > > > is
> > > > killed.
> > > >
> > > > The cause of the nfs41_init_clientid failure in the production
> > > > case
> > > > is a timeout.
> > > > The following error message is observed in logs:
> > > >   NFS: state manager: lease expired failed on NFSv4 server <ip>
> > > > with
> > > > error 110
> > > >
> > >
> > > How about something like the following fix then?
> > > 8<-----------------------------------------------
> > > From eb402b489bb0d0ada1a3dd9101d4d7e193402e46 Mon Sep 17 00:00:00
> > > 2001
> > > Message-ID:
> > > <eb402b489bb0d0ada1a3dd9101d4d7e193402e46.1725904471.git.trond.mykl
> > > ebust@xxxxxxxxxxxxxxx>
> > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > Date: Mon, 9 Sep 2024 13:47:07 -0400
> > > Subject: [PATCH] NFSv4: Fail mounts if the lease setup times out
> > >
> > > If the server is down when the client is trying to mount, so that
> > > the
> > > calls to exchange_id or create_session fail, then we should allow
> > > the
> > > mount system call to fail rather than hang and block other
> > > mount/umount
> > > calls.
> > >
> > > Reported-by: Oleksandr Tymoshenko <ovt@xxxxxxxxxx>
> > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > ---
> > >  fs/nfs/nfs4state.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > index 30aba1dedaba..59dcdf9bc7b4 100644
> > > --- a/fs/nfs/nfs4state.c
> > > +++ b/fs/nfs/nfs4state.c
> > > @@ -2024,6 +2024,12 @@ static int
> > > nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status)
> > >                 nfs_mark_client_ready(clp, -EPERM);
> > >                 clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
> > >                 return -EPERM;
> > > +       case -ETIMEDOUT:
> > > +               if (clp->cl_cons_state == NFS_CS_SESSION_INITING) {
> > > +                       nfs_mark_client_ready(clp, -EIO);
> > > +                       return -EIO;
> > > +               }
> > > +               fallthrough;
> > >         case -EACCES:
> > >         case -NFS4ERR_DELAY:
> > >         case -EAGAIN:
> > > --
> >
> > This patch fixes the issue in my simulated environment. ETIMEDOUT is
> > the error code that
> > was observed in the production env but I guess it's not the only
> > possible one. Would it make
> > sense to handle all error conditions in the NFS_CS_SESSION_INITING
> > state or are there
> > some others that are recoverable?
> >
>
> The only other one that I'm thinking might want to be treated similarly
> is the above EACCES error. That's because that is only returned if
> there is a problem with your RPCSEC_GSS/krb5 credential. I was thinking
> of changing that one too in the same patch, but came to the conclusion
> it would be better to treat the two issues with separate fixes.
>
> The other error conditions are all supposed to be transient NFS level
> errors. They should not be treated as fatal.

Sounds good. Will you submit this patch to the mainline kernel? If so
please add me to Cc. Thanks for looking into this.