On Mon, 2020-04-20 at 09:58 -0400, Olga Kornievskaia wrote: > > > On Fri, Apr 17, 2020 at 12:53 PM Trond Myklebust < > trondmy@xxxxxxxxxxxxxxx> wrote: > > On Fri, 2020-04-17 at 12:46 -0400, Olga Kornievskaia wrote: > > > On Fri, Apr 17, 2020 at 12:20 PM Trond Myklebust > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Fri, 2020-04-17 at 11:43 -0400, Olga Kornievskaia wrote: > > > > > Hi Trond, > > > > > > > > > > On Fri, Apr 17, 2020 at 11:31 AM Trond Myklebust > > > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > Hi Olga, > > > > > > > > > > > > On Fri, 2020-04-17 at 11:15 -0400, Olga Kornievskaia wrote: > > > > > > > When nconnect is used, SEQUENCE operation currently isn't > > > > > > > bound > > > > > > > to > > > > > > > a particular transport. The problem is created on an idle > > > > > > > mount, > > > > > > > where SEQUENCE is the only operation being sent and > > opened > > > > > > > TPC > > > > > > > connections are slowly being close from the lack of use. > > If > > > > > > > SEQUENCE > > > > > > > is not assigned to the main connection, the main > > connection > > > > > > > can > > > > > > > be closed and with that so is the back channel bound to > > that > > > > > > > connection. > > > > > > > > > > > > > > Since the only way client handles callback_path down is > > by > > > > > > > sending > > > > > > > BIND_CONN_TO_SESSION requesting to bind both backchannel > > and > > > > > > > fore > > > > > > > channel on the connection that was left going, but that > > > > > > > connection > > > > > > > was already bound to only forechannel. According to the > > spec, > > > > > > > it's > > > > > > > not allowed to change channel binding after they are > > done. > > > > > > > > > > > > > > The fix is to make sure that a lone SEQUENCE always goes > > on > > > > > > > the > > > > > > > main connection, keeping backchannel alive. > > > > > > > > > > > > > > Fixes: 5a0c257f8 ("NFS: send state management on a single > > > > > > > connection") > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx> > > > > > > > --- > > > > > > > fs/nfs/nfs4proc.c | 2 +- > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > > > > > > index 99e9f2e..461f85d 100644 > > > > > > > --- a/fs/nfs/nfs4proc.c > > > > > > > +++ b/fs/nfs/nfs4proc.c > > > > > > > @@ -8857,7 +8857,7 @@ static struct rpc_task > > > > > > > *_nfs41_proc_sequence(struct nfs_client *clp, > > > > > > > .rpc_client = clp->cl_rpcclient, > > > > > > > .rpc_message = &msg, > > > > > > > .callback_ops = &nfs41_sequence_ops, > > > > > > > - .flags = RPC_TASK_ASYNC | RPC_TASK_TIMEOUT, > > > > > > > + .flags = RPC_TASK_ASYNC | RPC_TASK_TIMEOUT > > | > > > > > > > RPC_TASK_NO_ROUND_ROBIN, > > > > > > > }; > > > > > > > struct rpc_task *ret; > > > > > > > > > > > > > > > > > > > This works only in the case where the client is only > > sending > > > > > > SEQUENCE > > > > > > instructions. There are other cases where it could be > > sending > > > > > > out > > > > > > other > > > > > > operations that also renew the lease, but is doing it very > > > > > > infrequently. Won't that also run into the same problem? > > > > > > > > > > Hm... I see so main channel can still go idle and close, when > > > > > infrequent operations are happening on the other connections > > > > > before > > > > > round-robin-ing to the main connection.... > > > > > > > > > > > Is the fundamental problem here that we're not handling the > > > > > > SEQ4_STATUS_CB_PATH_DOWN / SEQ4_STATUS_CB_PATH_DOWN_SESSION > > > > > > flags > > > > > > correctly or is there something else going on? > > > > > > > > > > Yes the client doesn't recover properly. But the fix wasn't > > > > > trivial > > > > > to > > > > > me (so I thought my patch was enough but I see it's not). Say > > > > > client > > > > > shuts down the main connection because it was idle. Now > > whatever > > > > > operations goes on a different connection is going to get > > > > > callback > > > > > down. The only way the client can create a new backchannel > > > > > (according > > > > > to the spec) is if it creates a brand new connection and > > sends > > > > > BIND_CONN_TO_SESSION there (all existing connections are > > already > > > > > bound > > > > > to fore channel and according to the spec you can't modify > > the > > > > > existing binding). But then we'd need to make sure that it's > > the > > > > > first > > > > > one in the list of connections we iterate thru (as i think > > 1st > > > > > marks > > > > > the main connection?) as the other operations that supposed > > to > > > > > only > > > > > go > > > > > on main connection need to know which connection to pick. > > > > > > > > > > The reason it's not seen against linux is because it doesn't > > > > > follow > > > > > the spec is doesn't reject attempts to bind a backchannel to > > an > > > > > already existing connection that was only bound for fore > > channel. > > > > > > > > > > > > > Oh, I see. So the server is replying NFS4ERR_INVAL in order to > > let > > > > the > > > > client know that it is trying to change the channel bindings > > for > > > > that > > > > connection. > > > > > > Well server isn't failing because client is asking for > > FORE_OR_BOTH > > > and it's a choice so server is returning FORE. I'm not sure we > > can > > > ask > > > the server to fail the request with ERR_INVAL.... (rather I can > > ask > > > but ) rather can we expect the server to do that always? > > > > In RFC5661, Section 18.34.3 I found the following normative text: > > > > Invoking BIND_CONN_TO_SESSION on a connection already associated > > with > > the specified session has no effect, and the server MUST respond > > with > > NFS4_OK, unless the client is demanding changes to the set of > > channels the connection is associated with. If so, the server > > MUST > > return NFS4ERR_INVAL. > > > > > > IOW: it sounds like your server isn't following the spec either. 🙂 > > So, because client is sending an "_OR_" that's not a demand change > but rather a request change and therefore follows the other rules > where the server is allowed to just return "FORE" channel binding. > I'll have to do something else rather than ERR_INVAL handle. > The protocol makes no difference here between demanding a change and requesting it. However my point was rather that the client is never trying to change anything. It is always consistently setting the same flags on the same 'struct xprt's, and so therefore if it sends a BIND_CONN_TO_SESSION on an existing connection, it will send the exact same set of flags that it sent when it originally established that connection. IOW: According to the spec, the server MUST always grant the request here because the client isn't changing the channel associations. The server is not allowed to make any changes to the channel associations because (again according to the spec) the call "has no effect". -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx