On Wed, 2018-03-21 at 05:44 +1100, NeilBrown wrote: > On Tue, Mar 20 2018, Trond Myklebust wrote: > > > On Tue, 2018-03-20 at 11:32 +1100, NeilBrown wrote: > > > On Fri, Mar 16 2018, Trond Myklebust wrote: > > > > > > > On Fri, 2018-03-16 at 10:31 +0100, Mkrtchyan, Tigran wrote: > > > > > Hi Neil, > > > > > > > > > > according to rfc5661, NFS4ERR_INVAL is returned by the server > > > > > if > > > > > it > > > > > thinks that client sends an invalid request (e.g. points to a > > > > > client > > > > > bug) > > > > > or server misinterpret it (broken server). > > > > > > > > > > With your change instead of failing the mount, client will > > > > > silently > > > > > go for > > > > > v4.0, even v4.1 mount was requested and produce undesirable > > > > > behavior, > > > > > e.g. > > > > > proxy-io instead of pnfs. I fill prefer fail-fast instead of > > > > > long > > > > > debug > > > > > sessions. > > > > > > > > > > On the other hand, I understand, that it's not always > > > > > possible to > > > > > fix > > > > > server > > > > > or clients in production environment and time-to-time > > > > > workarounds > > > > > are > > > > > necessary. > > > > > > > > > > > > > > > > > > I'd tend to agree with Tigran. Hiding server bugs, should not > > > > be a > > > > priority and particularly not in this case, where the > > > > workaround is > > > > simple: either turn off version negotiation altogether, or edit > > > > /etc/nfsmount.conf to negotiate a different set of versions. > > > > > > Yes, it could be worked-around in nfsmount.conf, but manual > > > configuration should be seen as an optimization or a last > > > resort. If > > > we can make things work without configuration, that provides the > > > best > > > experience. > > > In this case, the kernel has strong evidence that the server > > > isn't responding as expected, but it gives an unhelpful error > > > message. > > > > Server do not spontaneously break their ability to process NFSv4 > > operations and so this is not an issue that we need to worry about > > in > > ordinary operation. It should only ever be an issue when > > > > 1) An insufficiently tested and broken upgrade is applied to an > > existing server, in which case the main workaround should be to > > revert > > the upgrade until it can be fixed. > > 2) A completely new broken server is introduced to the system. > > 3) An upgrade to the client defaults to trying 4.2 first, then 4.1 > and > only then 4.0. Previous client defaulted to 4.0. I'm sorry, but this still does not sounds like a good case for "fix the kernel client". The kernel has no opinion on which NFS version you should try first in a mount attempt: that decision is made in userspace. If you upgraded the nfs-utils to something that now tries 4.2 first, then that is a user space policy issue. > > > > > At the very least, nfs4_discover_server_trunking() should not > > > treat > > > -NFS4ERR_INVAL as unexpect (because there is code in > > > nfs4_check_cl_exchange_flags which explicitly generates it). > > > If it just let this error through, instead of translating it to > > > EIO, > > > then the problem would go away. > > > > The code in nfs4_check_cl_exchange_flags is there to check for > > explicit NFSv4.1 protocol violations. Is it broken? > > It is broken in that it reports -EINVAL when no arguments were > invalid. > This gets translated to -EIO when there was no IO error. The purpose of that function is to ensure that the server is advertising itself as a bona fide NFSv4.1 or newer server, and to ensure that it is not replying to our EXCHANGE_ID request with some flag or service that we did not expect and that might cause us to break. IOW: it is there to check protocol compliance. As far as I can tell, it is doing that, and doing so correctly. > Thanks, > NeilBrown > > > > > > > > > > > > What we might want to do, is make it easier to allow the user > > > > to > > > > detect > > > > that this is indeed a server bug and is not a problem with the > > > > arguments supplied to the "mount" utility. Perhaps we might > > > > have > > > > the > > > > kernel log something in the syslogs? > > > > > > Yes, logging a message might be useful. Most of the messages > > > logged > > > about bad servers are currently going through dprintk(), so they > > > won't > > > often be seen. Is that what we want?? Don't know... > > > > > > Anyway, you point that it "is not a problem with the arguments" > > > is > > > stop-on. If the client gets EINVAL from the server, then it > > > shouldn't > > > blindly report that back to the user as EINVAL means "Invalid > > > argument" and the argements given to the server are probably not > > > the > > > argument given by the user. > > > > > > Following that line of reasoning, I think > > > nfs4_check_cl_exchange_flag() > > > should *not* return -NFS4ERR_INVAL, and _nfs4_proc_exchange_id() > > > shouldn't pass NFS4ERR_INVAL through unchanged. > > > > > > So I propose the following version. > > > > > > Thanks, > > > NeilBrown > > > > > > ------------------------8<--------------------------- > > > From: NeilBrown <neilb@xxxxxxxx> > > > Date: Tue, 20 Mar 2018 11:31:33 +1100 > > > Subject: [PATCH] NFSv4: handle EINVAL from EXCHANGE_ID better. > > > > > > nfs4_proc_exchange_id() can return -EINVAL if the server > > > reported NFS4INVAL (which I have seen in a packet trace), > > > or nfs4_check_cl_exchange_flags() exchange flags detects > > > a problem. > > > Each of these mean that NFSv4.1 and later cannot be used, > > > but they should not prevent fallback to NFSv4.0. Currently > > > they do. > > > > > > Currently this EINVAL error is returned by > > > nfs4_proc_exchange_id() to nfs41_discover_server_trunking(), > > > and thence to nfs4_discover_server_trunking(). > > > nfs4_discover_server_trunking() doesn't understand EINVAL, > > > so converts it to EIO which causes mount.nfs to think > > > something is horribly wrong and to give up. > > > > > > EINVAL is never a sensible error code here. It means "Invalid > > > argument", but is being used when the problem is "Invalid > > > response > > > from the server". If we change these two circumstances to report > > > EPROTONOSUPPORT to the caller (which seems a reasonable > > > assessment > > > when the server gives confusing responses), and if we enhance > > > nfs4_discover_server_trunking() to treat -EPROTONOSUPPORT as an > > > expected error to pass through, then the error reported to user- > > > space > > > will be more representative of the actual fault. > > > > > > A failure to negotiate a client ID clearly shows that NFSv4.1 > > > cannot > > > be supported, but isn't as general a failure as EIO. > > > > > > Signed-off-by: NeilBrown <neilb@xxxxxxxx> > > > --- > > > fs/nfs/nfs4proc.c | 18 +++++++++++++++--- > > > fs/nfs/nfs4state.c | 1 + > > > 2 files changed, 16 insertions(+), 3 deletions(-) > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > > index 47f3c273245e..97757f646f13 100644 > > > --- a/fs/nfs/nfs4proc.c > > > +++ b/fs/nfs/nfs4proc.c > > > @@ -7364,7 +7364,8 @@ static int nfs4_check_cl_exchange_flags(u32 > > > flags) > > > goto out_inval; > > > return NFS_OK; > > > out_inval: > > > - return -NFS4ERR_INVAL; > > > + dprintk("NFS: server returns invalid flags for > > > EXCHANGE_ID\n"); > > > + return -EPROTONOSUPPORT; > > > } > > > > > > static bool > > > @@ -7741,8 +7742,19 @@ static int _nfs4_proc_exchange_id(struct > > > nfs_client *clp, struct rpc_cred *cred, > > > int status; > > > > > > task = nfs4_run_exchange_id(clp, cred, sp4_how, NULL); > > > - if (IS_ERR(task)) > > > - return PTR_ERR(task); > > > + if (IS_ERR(task)) { > > > + status = PTR_ERR(task); > > > + if (status == -NFS4ERR_INVAL) { > > > + /* If the server think we did something > > > invalid, it is certainly > > > + * not the fault of our caller, so it > > > would > > > wrong to report > > > + * this error back up. So in that case > > > simply acknowledge that > > > + * we don't seem able to support this > > > protocol. > > > + */ > > > + dprintk("NFS: server return > > > NFS4ERR_INVAL to > > > EXCHANGE_ID\n"); > > > + status = -EPROTONOSUPPORT; > > > + } > > > + return status; > > > + } > > > > > > argp = task->tk_msg.rpc_argp; > > > resp = task->tk_msg.rpc_resp; > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > > index 91a4d4eeb235..273c032089c4 100644 > > > --- a/fs/nfs/nfs4state.c > > > +++ b/fs/nfs/nfs4state.c > > > @@ -2219,6 +2219,7 @@ int nfs4_discover_server_trunking(struct > > > nfs_client *clp, > > > clnt = clp->cl_rpcclient; > > > goto again; > > > > > > + case -EPROTONOSUPPORT: > > > case -NFS4ERR_MINOR_VERS_MISMATCH: > > > status = -EPROTONOSUPPORT; > > > break; > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, PrimaryData > > trond.myklebust@xxxxxxxxxxxxxxx -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx
Attachment:
signature.asc
Description: This is a digitally signed message part