Re: [PATCH 1/1] NFSv4.1: fix lone sequence transport assignment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 17, 2020 at 12:20 PM Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:
>
> On Fri, 2020-04-17 at 11:43 -0400, Olga Kornievskaia wrote:
> > Hi Trond,
> >
> > On Fri, Apr 17, 2020 at 11:31 AM Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > Hi Olga,
> > >
> > > On Fri, 2020-04-17 at 11:15 -0400, Olga Kornievskaia wrote:
> > > > When nconnect is used, SEQUENCE operation currently isn't bound
> > > > to
> > > > a particular transport. The problem is created on an idle mount,
> > > > where SEQUENCE is the only operation being sent and opened TPC
> > > > connections are slowly being close from the lack of use. If
> > > > SEQUENCE
> > > > is not assigned to the main connection, the main connection can
> > > > be closed and with that so is the back channel bound to that
> > > > connection.
> > > >
> > > > Since the only way client handles callback_path down is by
> > > > sending
> > > > BIND_CONN_TO_SESSION requesting to bind both backchannel and fore
> > > > channel on the connection that was left going, but that
> > > > connection
> > > > was already bound to only forechannel. According to the spec,
> > > > it's
> > > > not allowed to change channel binding after they are done.
> > > >
> > > > The fix is to make sure that a lone SEQUENCE always goes on the
> > > > main connection, keeping backchannel alive.
> > > >
> > > > Fixes: 5a0c257f8 ("NFS: send state management on a single
> > > > connection")
> > > > Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx>
> > > > ---
> > > >  fs/nfs/nfs4proc.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > > index 99e9f2e..461f85d 100644
> > > > --- a/fs/nfs/nfs4proc.c
> > > > +++ b/fs/nfs/nfs4proc.c
> > > > @@ -8857,7 +8857,7 @@ static struct rpc_task
> > > > *_nfs41_proc_sequence(struct nfs_client *clp,
> > > >               .rpc_client = clp->cl_rpcclient,
> > > >               .rpc_message = &msg,
> > > >               .callback_ops = &nfs41_sequence_ops,
> > > > -             .flags = RPC_TASK_ASYNC | RPC_TASK_TIMEOUT,
> > > > +             .flags = RPC_TASK_ASYNC | RPC_TASK_TIMEOUT |
> > > > RPC_TASK_NO_ROUND_ROBIN,
> > > >       };
> > > >       struct rpc_task *ret;
> > > >
> > >
> > > This works only in the case where the client is only sending
> > > SEQUENCE
> > > instructions. There are other cases where it could be sending out
> > > other
> > > operations that also renew the lease, but is doing it very
> > > infrequently. Won't that also run into the same problem?
> >
> > Hm... I see so main channel can still go idle and close, when
> > infrequent operations are happening on the other connections before
> > round-robin-ing to the main connection....
> >
> > > Is the fundamental problem here that we're not handling the
> > > SEQ4_STATUS_CB_PATH_DOWN / SEQ4_STATUS_CB_PATH_DOWN_SESSION flags
> > > correctly or is there something else going on?
> >
> > Yes the client doesn't recover properly. But the fix wasn't trivial
> > to
> > me (so I thought my patch was enough but I see it's not). Say client
> > shuts down the main connection because it was idle. Now whatever
> > operations goes on a different connection is going to get callback
> > down. The only way the client can create a new backchannel (according
> > to the spec) is if it creates a brand new connection and sends
> > BIND_CONN_TO_SESSION there (all existing connections are already
> > bound
> > to fore channel and according to the spec you can't modify the
> > existing binding). But then we'd need to make sure that it's the
> > first
> > one in the list of connections we iterate thru (as i think 1st marks
> > the main connection?) as the other operations that supposed to only
> > go
> > on main connection need to know which connection to pick.
> >
> > The reason it's not seen against linux is because it doesn't follow
> > the spec is doesn't reject attempts to bind a backchannel to an
> > already existing connection that was only bound for fore channel.
> >
>
> Oh, I see. So the server is replying NFS4ERR_INVAL in order to let the
> client know that it is trying to change the channel bindings for that
> connection.

Well server isn't failing because client is asking for FORE_OR_BOTH
and it's a choice so server is returning FORE. I'm not sure we can ask
the server to fail the request with ERR_INVAL.... (rather I can ask
but ) rather can we expect the server to do that always?

> Hmm... Is there any reason why we can't just add a handler to
> nfs4_bind_one_conn_to_session_done() that intercepts NFS4ERR_INVAL, and
> disconnects the xprt before retrying?
> We should probably add a wrapper to xprt_force_disconnect() in
> include/linux/sunrpc/clnt.h. Something like the following?
>
>
> static inline void rpc_task_close_connection(struct rpc_task *task)
> {
>         if (task->tk_xprt)
>                 xprt_force_disconnect(task->tk_xprt);
> }
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
>
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux