Re: [PATCH 1/1] NFSv4.1: fix lone sequence transport assignment

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 17 Apr 2020 19:24:24 +0000

On Fri, 2020-04-17 at 15:06 -0400, Olga Kornievskaia wrote:
> On Fri, Apr 17, 2020 at 2:41 PM Trond Myklebust <
> trondmy@xxxxxxxxxxxxxxx> wrote:
> > On Fri, 2020-04-17 at 14:08 -0400, Olga Kornievskaia wrote:
> > > On Fri, Apr 17, 2020 at 12:53 PM Trond Myklebust
> > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > On Fri, 2020-04-17 at 12:46 -0400, Olga Kornievskaia wrote:
> > > > > On Fri, Apr 17, 2020 at 12:20 PM Trond Myklebust
> > > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > > On Fri, 2020-04-17 at 11:43 -0400, Olga Kornievskaia wrote:
> > > > > > > Hi Trond,
> > > > > > > 
> > > > > > > On Fri, Apr 17, 2020 at 11:31 AM Trond Myklebust
> > > > > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > > > > Hi Olga,
> > > > > > > > 
> > > > > > > > On Fri, 2020-04-17 at 11:15 -0400, Olga Kornievskaia
> > > > > > > > wrote:
> > > > > > > > > When nconnect is used, SEQUENCE operation currently
> > > > > > > > > isn't
> > > > > > > > > bound
> > > > > > > > > to
> > > > > > > > > a particular transport. The problem is created on an
> > > > > > > > > idle
> > > > > > > > > mount,
> > > > > > > > > where SEQUENCE is the only operation being sent and
> > > > > > > > > opened
> > > > > > > > > TPC
> > > > > > > > > connections are slowly being close from the lack of
> > > > > > > > > use.
> > > > > > > > > If
> > > > > > > > > SEQUENCE
> > > > > > > > > is not assigned to the main connection, the main
> > > > > > > > > connection
> > > > > > > > > can
> > > > > > > > > be closed and with that so is the back channel bound
> > > > > > > > > to
> > > > > > > > > that
> > > > > > > > > connection.
> > > > > > > > > 
> > > > > > > > > Since the only way client handles callback_path down
> > > > > > > > > is
> > > > > > > > > by
> > > > > > > > > sending
> > > > > > > > > BIND_CONN_TO_SESSION requesting to bind both
> > > > > > > > > backchannel
> > > > > > > > > and
> > > > > > > > > fore
> > > > > > > > > channel on the connection that was left going, but
> > > > > > > > > that
> > > > > > > > > connection
> > > > > > > > > was already bound to only forechannel. According to
> > > > > > > > > the
> > > > > > > > > spec,
> > > > > > > > > it's
> > > > > > > > > not allowed to change channel binding after they are
> > > > > > > > > done.
> > > > > > > > > 
> > > > > > > > > The fix is to make sure that a lone SEQUENCE always
> > > > > > > > > goes
> > > > > > > > > on
> > > > > > > > > the
> > > > > > > > > main connection, keeping backchannel alive.
> > > > > > > > > 
> > > > > > > > > Fixes: 5a0c257f8 ("NFS: send state management on a
> > > > > > > > > single
> > > > > > > > > connection")
> > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx>
> > > > > > > > > ---
> > > > > > > > >  fs/nfs/nfs4proc.c | 2 +-
> > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > > > > > > > index 99e9f2e..461f85d 100644
> > > > > > > > > --- a/fs/nfs/nfs4proc.c
> > > > > > > > > +++ b/fs/nfs/nfs4proc.c
> > > > > > > > > @@ -8857,7 +8857,7 @@ static struct rpc_task
> > > > > > > > > *_nfs41_proc_sequence(struct nfs_client *clp,
> > > > > > > > >               .rpc_client = clp->cl_rpcclient,
> > > > > > > > >               .rpc_message = &msg,
> > > > > > > > >               .callback_ops = &nfs41_sequence_ops,
> > > > > > > > > -             .flags = RPC_TASK_ASYNC |
> > > > > > > > > RPC_TASK_TIMEOUT,
> > > > > > > > > +             .flags = RPC_TASK_ASYNC |
> > > > > > > > > RPC_TASK_TIMEOUT
> > > > > > > > > RPC_TASK_NO_ROUND_ROBIN,
> > > > > > > > >       };
> > > > > > > > >       struct rpc_task *ret;
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > This works only in the case where the client is only
> > > > > > > > sending
> > > > > > > > SEQUENCE
> > > > > > > > instructions. There are other cases where it could be
> > > > > > > > sending
> > > > > > > > out
> > > > > > > > other
> > > > > > > > operations that also renew the lease, but is doing it
> > > > > > > > very
> > > > > > > > infrequently. Won't that also run into the same
> > > > > > > > problem?
> > > > > > > 
> > > > > > > Hm... I see so main channel can still go idle and close,
> > > > > > > when
> > > > > > > infrequent operations are happening on the other
> > > > > > > connections
> > > > > > > before
> > > > > > > round-robin-ing to the main connection....
> > > > > > > 
> > > > > > > > Is the fundamental problem here that we're not handling
> > > > > > > > the
> > > > > > > > SEQ4_STATUS_CB_PATH_DOWN /
> > > > > > > > SEQ4_STATUS_CB_PATH_DOWN_SESSION
> > > > > > > > flags
> > > > > > > > correctly or is there something else going on?
> > > > > > > 
> > > > > > > Yes the client doesn't recover properly. But the fix
> > > > > > > wasn't
> > > > > > > trivial
> > > > > > > to
> > > > > > > me (so I thought my patch was enough but I see it's not).
> > > > > > > Say
> > > > > > > client
> > > > > > > shuts down the main connection because it was idle. Now
> > > > > > > whatever
> > > > > > > operations goes on a different connection is going to get
> > > > > > > callback
> > > > > > > down. The only way the client can create a new
> > > > > > > backchannel
> > > > > > > (according
> > > > > > > to the spec) is if it creates a brand new connection and
> > > > > > > sends
> > > > > > > BIND_CONN_TO_SESSION there (all existing connections are
> > > > > > > already
> > > > > > > bound
> > > > > > > to fore channel and according to the spec you can't
> > > > > > > modify
> > > > > > > the
> > > > > > > existing binding). But then we'd need to make sure that
> > > > > > > it's
> > > > > > > the
> > > > > > > first
> > > > > > > one in the list of connections we iterate thru (as i
> > > > > > > think
> > > > > > > 1st
> > > > > > > marks
> > > > > > > the main connection?) as the other operations that
> > > > > > > supposed
> > > > > > > to
> > > > > > > only
> > > > > > > go
> > > > > > > on main connection need to know which connection to pick.
> > > > > > > 
> > > > > > > The reason it's not seen against linux is because it
> > > > > > > doesn't
> > > > > > > follow
> > > > > > > the spec is doesn't reject attempts to bind a backchannel
> > > > > > > to
> > > > > > > an
> > > > > > > already existing connection that was only bound for fore
> > > > > > > channel.
> > > > > > > 
> > > > > > 
> > > > > > Oh, I see. So the server is replying NFS4ERR_INVAL in order
> > > > > > to
> > > > > > let
> > > > > > the
> > > > > > client know that it is trying to change the channel
> > > > > > bindings
> > > > > > for
> > > > > > that
> > > > > > connection.
> > > > > 
> > > > > Well server isn't failing because client is asking for
> > > > > FORE_OR_BOTH
> > > > > and it's a choice so server is returning FORE. I'm not sure
> > > > > we
> > > > > can
> > > > > ask
> > > > > the server to fail the request with ERR_INVAL.... (rather I
> > > > > can
> > > > > ask
> > > > > but ) rather can we expect the server to do that always?
> > > > 
> > > > In RFC5661, Section 18.34.3 I found the following normative
> > > > text:
> > > > 
> > > >    Invoking BIND_CONN_TO_SESSION on a connection already
> > > > associated
> > > > with
> > > >    the specified session has no effect, and the server MUST
> > > > respond
> > > > with
> > > >    NFS4_OK, unless the client is demanding changes to the set
> > > > of
> > > >    channels the connection is associated with.  If so, the
> > > > server
> > > > MUST
> > > >    return NFS4ERR_INVAL.
> > > > 
> > > > 
> > > > IOW: it sounds like your server isn't following the spec
> > > > either.
> > > 
> > > Thank you thank you! I missed that. Ok so we'll assume it does
> > > and I
> > > will try to come up with what you are suggesting...
> > > 
> > > Do we need to do reorder the connection list? After we reset the
> > > connection, it will be on some random connection on the list of
> > > connections. But for the NO_ROBIN connections we pick the first
> > > connection in the list (which will not be the backchannel
> > > connection).
> > > Does it matter?
> > 
> > That's why I suggested the rpc_task_close_connection() wrapper
> > below.
> > It should ensure that we close the same connection that got the
> > NFS4ERR_INVAL, and then when we restart the RPC call (using
> > rpc_restart_call_prepare()) it will just automatically create a new
> > connection on that same struct xprt...
> 
> Ok let me try with an example.
> Start with:
> connection1 (first one on the list) (also the backchannel connection)
> connection2
> connection3
> 
> 1. connection1 and 2 go idle and are stopped. Now backchannel is
> down.
> 2. connection3 sends a SEQUENCE (or whatever op) and gets
> callback_path down
> 3. client send BIND_CONN_TO_SESSION on connection3. gets ERR_INVAL.
> Client resets the connection3 and sends new BIND_CONN_TO_SESSION and
> now connection3 is our backchannel and fore channel. But connection3
> is not the first one the list of transports
> 4. state management operations will pick connection1 out of the list
> (but it's not a backchannel connection).
> 
> Is that a problem?
> 

Right now it shouldn't be a problem because
nfs4_proc_bind_conn_to_session() actually iterates through all the
connections and sends a BIND_CONN_TO_SESSION on each one.

It only sets NFS4_CDFC4_FORE_OR_BOTH if the session flags indicate a
back channel is required, and if this is the first connection in the
list (i.e. if xprt == clnt->cl_xprt).

So as far as I can tell, we should actually already be OK. The client
should consistently be setting the same flags on the same struct xprts
every time there is a new connection.
As far as I can tell, we should therefore actually never be seeing any
NFS4ERR_INVAL cases but only be hitting the case that the spec allows.
That said, it can't harm to have a handler for it.

> > > > > > Hmm... Is there any reason why we can't just add a handler
> > > > > > to
> > > > > > nfs4_bind_one_conn_to_session_done() that intercepts
> > > > > > NFS4ERR_INVAL,
> > > > > > and
> > > > > > disconnects the xprt before retrying?
> > > > > > We should probably add a wrapper to xprt_force_disconnect()
> > > > > > in
> > > > > > include/linux/sunrpc/clnt.h. Something like the following?
> > > > > > 
> > > > > > 
> > > > > > static inline void rpc_task_close_connection(struct
> > > > > > rpc_task
> > > > > > *task)
> > > > > > {
> > > > > >         if (task->tk_xprt)
> > > > > >                 xprt_force_disconnect(task->tk_xprt);
> > > > > > }
> > > > > > 
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx