On Wed, 2023-09-06 at 18:05 +0000, Chuck Lever III wrote: > > > > On Sep 6, 2023, at 12:18 PM, Trond Myklebust <trondmy@xxxxxxxxxx> > > wrote: > > > > On Wed, 2023-09-06 at 15:20 +0000, Chuck Lever III wrote: > > > > > > > > > > On Sep 6, 2023, at 10:33 AM, Trond Myklebust > > > > <trondmy@xxxxxxxxxx> > > > > wrote: > > > > > > > > On Wed, 2023-09-06 at 13:40 +0000, Chuck Lever III wrote: > > > > > > > > > > > > > > > > On Sep 5, 2023, at 9:03 PM, trondmy@xxxxxxxxxx wrote: > > > > > > > > > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > > > > > > > This reverts commit > > > > > > 0701214cd6e66585a999b132eb72ae0489beb724. > > > > > > > > > > > > The premise of this commit was incorrect. There are exactly > > > > > > 2 > > > > > > cases > > > > > > where rpcauth_checkverf() will return an error: > > > > > > > > > > > > 1) If there was an XDR decode problem (i.e. garbage data). > > > > > > 2) If gss_validate() had a problem verifying the RPCSEC_GSS > > > > > > MIC. > > > > > > > > > > There's also the AUTH_TLS probe: > > > > > > > > > > https://www.rfc-editor.org/rfc/rfc9289.html#section-4.1-7 > > > > > > > > > > That was the purpose of 0701214cd6e6. > > > > > > > > > > Reverting this commit is likely to cause problems when our > > > > > TLS-capable client interacts with a server that knows > > > > > nothing of AUTH_TLS. > > > > > > > > The patch completely broke the semantics of the header > > > > validation > > > > code. > > > > > > If that were truly the case, it's amazing that the client > > > has hobbled along for the past 14 months with no-one > > > noticing the breakage until now. > > > > > > Seriously, though, treating a bad verifier as garbage args > > > is not intuitive. If it's that critical there needs to be > > > a comment in the code explaining why. > > > > > > > It is necessary because of the peculiarities of RPCSEC_GSS and the > > session semantics it implements. > > See > > https://datatracker.ietf.org/doc/html/rfc2203#section-5.3.3.1 and > > in particular, the paragraph discussing retransmissions by the > > client. > > Retrying is fine. > > But the counter in the client is called "garbage_retries". > That's not what is going on the EACCES case, though the > behavior is close enough -- it's code re-use (good) without > appropriate documentation (bad). > > The decoder treats EIO and EACCES exactly the same way. > Again, code reuse (good) without appropriate documentation > (bad). > > I tried to address that in my RFC patch by adding a small > explanatory comment and by adding an API contract for > rpcauth_checkverf(). > > > > > > There is no discussion about whether or not it needs to be > > > > reverted. > > > > > > The patch description is wrong, though, to exclude AUTH_TLS. > > > > > > The reverting patch description claims to be an exhaustive > > > list of all the cases, but it doesn't mention the AUTH_TLS > > > case at all. > > > > > > > > > > If the TLS code needs special treatment, then a separate patch > > > > is > > > > needed to fix tls_validate() to return something that can be > > > > caught > > > > by > > > > rpc_decode_header and interpreted differently to the EIO and > > > > EACCES > > > > error codes currently being returned by RPCSEC_GSS, AUTH_SYS > > > > and > > > > others. > > > > > > That could have been brought up when 0701214cd6e6 was first > > > posted for review. Interesting that the decoder currently > > > does not distinguish between EIO and EACCES. > > > > > > Thanks for the suggestion, I'll have a look. > > > > > > > Now that I look at it, I think your approach to satisfying RFC9289 > > is > > not correct. > > I'm not following what aspect of the implementation is problematic. > I'm going to assume you mean the implementation of opportunism. > > > > Since this is a transport level issue, why should we not just mark > > the > > xprt for disconnection, and then retry? It is entirely possible > > that > > some load balancer/floating IP has just moved the connection to > > some > > node that was not expecting to do TLS. > > Depending on the security policy chosen by the client's > administrator, > that could either be a security problem or a "don't care" situation. > > If the administrator wants the client to _require_ TLS, then > connecting to a load balancer where TLS suddenly becomes unavailable > after a reconnect is a hard error. This prevents STRIPTLS attacks. > That's good security. > > If the administrator wants to allow operation to continue even if TLS > is not available, then the client can recover by not using TLS. > That's > rather terrible security, but can be desirable to improve backward > compatibility. > > > > The only case where that should > > not be assumed is the case where the error happens right at the > > very > > beginning of the mount, when disconnecting should normally suffice > > to > > trigger the RPC_TASK_SOFTCONN code anyway. > > If TLS goes away after a reconnect, that's a problem. Whether > further operation should stop depends on the administrator's > chosen security policy. > > The security policies are NFS-level settings (eg, mount options). > RPC just indicates to NFS whether the traffic is protected or not. > > When NFS asks RPC to ensure the communication channel is protected, > that means every reconnect is protected. Communication with that > security policy cannot happen without protection. > > Trust me, the security community will have it no other way. > > If you need opportunism in this case, then I can add back the > "xprtsec=auto" mount option, which you asked me to remove a while > back. I don't see how described behaviour would cause the operation to proceed without TLS. I'm saying disconnect and then retry TLS negotiation, and then eventually fail. > > > > > > > > In the second case, there are again 2 subcases: > > > > > > > > > > > > a) The GSS context expires, in which case gss_validate() > > > > > > will > > > > > > force > > > > > > a > > > > > > new context negotiation on retry by invalidating the > > > > > > cred. > > > > > > b) The sequence number check failed because an RPC call > > > > > > timed > > > > > > out, > > > > > > and > > > > > > the client retransmitted the request using a new sequence > > > > > > number, > > > > > > as required by RFC2203. > > > > > > > > > > > > In neither subcase is this a fatal error. > > > > > > > > > > > > Reported-by: Russell Cattelan <cattelan@xxxxxxxxxxx> > > > > > > Fixes: 0701214cd6e6 ("SUNRPC: Fail faster on bad verifier") > > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > > Signed-off-by: Trond Myklebust > > > > > > <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > --- > > > > > > net/sunrpc/clnt.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > > > > > index 12c46e129db8..5a7de7e55548 100644 > > > > > > --- a/net/sunrpc/clnt.c > > > > > > +++ b/net/sunrpc/clnt.c > > > > > > @@ -2724,7 +2724,7 @@ rpc_decode_header(struct rpc_task > > > > > > *task, > > > > > > struct xdr_stream *xdr) > > > > > > > > > > > > out_verifier: > > > > > > trace_rpc_bad_verifier(task); > > > > > > - goto out_err; > > > > > > + goto out_garbage; > > > > > > > > > > > > out_msg_denied: > > > > > > error = -EACCES; > > > > > > -- > > > > > > 2.41.0 > > > > > > > > > > > > > > > > -- > > > > > Chuck Lever > > > > > > > > > > > > > > > > > > -- > > > > Trond Myklebust > > > > Linux NFS client maintainer, Hammerspace > > > > trond.myklebust@xxxxxxxxxxxxxxx > > > > > > > > > -- > > > Chuck Lever > > > -- > Chuck Lever > > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx