On Thu, 2023-09-28 at 10:58 -0400, Olga Kornievskaia wrote: > On Wed, Sep 27, 2023 at 3:35 PM <trondmy@xxxxxxxxxx> wrote: > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > If the TCP connection attempt fails without ever establishing a > > connection, then assume the problem may be the server is rejecting > > us > > due to port reuse. > > Doesn't this break 4.0 replay cache? Seems too general to assume that > any unsuccessful SYN was due to a server reboot and it's ok for the > client to change the port. This is where things get interesting. Yes, if we change the port number, then it will almost certainly break NFSv3 and NFSv4.0 replay caching on the server. However the problem is that once we get stuck in the situation where we cannot connect, then each new connection attempt is just causing the server's TCP layer to push back and recall that the connection from this port was closed. IOW: the problem is that once we're in this situation, we cannot easily exit without doing one of the following. Either we have to 1. Change the port number, so that the TCP layer allows us to connect. 2. Or.. Wait for long enough that the TCP layer has forgotten altogether about the previous connection. The problem is that option (2) is subject to livelock, and so has a potential infinite time out. I've seen this livelock in action, and I'm not seeing a solution that has predictable results. So unless there is a solution for the problems in (2), I don't see how we can avoid defaulting to option (1) at some point, in which case the only question is "when do we switch ports?". > > > > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > --- > > net/sunrpc/xprtsock.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > index 71848ab90d13..1a96777f0ed5 100644 > > --- a/net/sunrpc/xprtsock.c > > +++ b/net/sunrpc/xprtsock.c > > @@ -62,6 +62,7 @@ > > #include "sunrpc.h" > > > > static void xs_close(struct rpc_xprt *xprt); > > +static void xs_reset_srcport(struct sock_xprt *transport); > > static void xs_set_srcport(struct sock_xprt *transport, struct > > socket *sock); > > static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt, > > struct socket *sock); > > @@ -1565,8 +1566,10 @@ static void xs_tcp_state_change(struct sock > > *sk) > > break; > > case TCP_CLOSE: > > if (test_and_clear_bit(XPRT_SOCK_CONNECTING, > > - &transport->sock_state)) > > + &transport->sock_state)) { > > + xs_reset_srcport(transport); > > xprt_clear_connecting(xprt); > > + } > > clear_bit(XPRT_CLOSING, &xprt->state); > > /* Trigger the socket release */ > > xs_run_error_worker(transport, > > XPRT_SOCK_WAKE_DISCONNECT); > > @@ -1722,6 +1725,11 @@ static void xs_set_port(struct rpc_xprt > > *xprt, unsigned short port) > > xs_update_peer_port(xprt); > > } > > > > +static void xs_reset_srcport(struct sock_xprt *transport) > > +{ > > + transport->srcport = 0; > > +} > > + > > static void xs_set_srcport(struct sock_xprt *transport, struct > > socket *sock) > > { > > if (transport->srcport == 0 && transport->xprt.reuseport) > > -- > > 2.41.0 > > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx