On Wed, 2023-09-06 at 15:20 +0000, Chuck Lever III wrote: > > > > On Sep 6, 2023, at 10:33 AM, Trond Myklebust <trondmy@xxxxxxxxxx> > > wrote: > > > > On Wed, 2023-09-06 at 13:40 +0000, Chuck Lever III wrote: > > > > > > > > > > On Sep 5, 2023, at 9:03 PM, trondmy@xxxxxxxxxx wrote: > > > > > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > > > This reverts commit 0701214cd6e66585a999b132eb72ae0489beb724. > > > > > > > > The premise of this commit was incorrect. There are exactly 2 > > > > cases > > > > where rpcauth_checkverf() will return an error: > > > > > > > > 1) If there was an XDR decode problem (i.e. garbage data). > > > > 2) If gss_validate() had a problem verifying the RPCSEC_GSS > > > > MIC. > > > > > > There's also the AUTH_TLS probe: > > > > > > https://www.rfc-editor.org/rfc/rfc9289.html#section-4.1-7 > > > > > > That was the purpose of 0701214cd6e6. > > > > > > Reverting this commit is likely to cause problems when our > > > TLS-capable client interacts with a server that knows > > > nothing of AUTH_TLS. > > > > The patch completely broke the semantics of the header validation > > code. > > If that were truly the case, it's amazing that the client > has hobbled along for the past 14 months with no-one > noticing the breakage until now. > > Seriously, though, treating a bad verifier as garbage args > is not intuitive. If it's that critical there needs to be > a comment in the code explaining why. > It is necessary because of the peculiarities of RPCSEC_GSS and the session semantics it implements. See https://datatracker.ietf.org/doc/html/rfc2203#section-5.3.3.1 and in particular, the paragraph discussing retransmissions by the client. > > There is no discussion about whether or not it needs to be > > reverted. > > The patch description is wrong, though, to exclude AUTH_TLS. > > The reverting patch description claims to be an exhaustive > list of all the cases, but it doesn't mention the AUTH_TLS > case at all. > > > > If the TLS code needs special treatment, then a separate patch is > > needed to fix tls_validate() to return something that can be caught > > by > > rpc_decode_header and interpreted differently to the EIO and EACCES > > error codes currently being returned by RPCSEC_GSS, AUTH_SYS and > > others. > > That could have been brought up when 0701214cd6e6 was first > posted for review. Interesting that the decoder currently > does not distinguish between EIO and EACCES. > > Thanks for the suggestion, I'll have a look. > Now that I look at it, I think your approach to satisfying RFC9289 is not correct. Since this is a transport level issue, why should we not just mark the xprt for disconnection, and then retry? It is entirely possible that some load balancer/floating IP has just moved the connection to some node that was not expecting to do TLS. The only case where that should not be assumed is the case where the error happens right at the very beginning of the mount, when disconnecting should normally suffice to trigger the RPC_TASK_SOFTCONN code anyway. > > > > > In the second case, there are again 2 subcases: > > > > > > > > a) The GSS context expires, in which case gss_validate() will > > > > force > > > > a > > > > new context negotiation on retry by invalidating the cred. > > > > b) The sequence number check failed because an RPC call timed > > > > out, > > > > and > > > > the client retransmitted the request using a new sequence > > > > number, > > > > as required by RFC2203. > > > > > > > > In neither subcase is this a fatal error. > > > > > > > > Reported-by: Russell Cattelan <cattelan@xxxxxxxxxxx> > > > > Fixes: 0701214cd6e6 ("SUNRPC: Fail faster on bad verifier") > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > Signed-off-by: Trond Myklebust > > > > <trond.myklebust@xxxxxxxxxxxxxxx> > > > > --- > > > > net/sunrpc/clnt.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > > > index 12c46e129db8..5a7de7e55548 100644 > > > > --- a/net/sunrpc/clnt.c > > > > +++ b/net/sunrpc/clnt.c > > > > @@ -2724,7 +2724,7 @@ rpc_decode_header(struct rpc_task *task, > > > > struct xdr_stream *xdr) > > > > > > > > out_verifier: > > > > trace_rpc_bad_verifier(task); > > > > - goto out_err; > > > > + goto out_garbage; > > > > > > > > out_msg_denied: > > > > error = -EACCES; > > > > -- > > > > 2.41.0 > > > > > > > > > > -- > > > Chuck Lever > > > > > > > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@xxxxxxxxxxxxxxx > > > -- > Chuck Lever > >