Re: [PATCH] SUNRPC: Don't retry using the same source port if connection failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2023-09-30 at 18:36 -0400, Olga Kornievskaia wrote:
> On Fri, Sep 29, 2023 at 10:57 PM Trond Myklebust
> <trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Thu, 2023-09-28 at 10:58 -0400, Olga Kornievskaia wrote:
> > > On Wed, Sep 27, 2023 at 3:35 PM <trondmy@xxxxxxxxxx> wrote:
> > > > 
> > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > 
> > > > If the TCP connection attempt fails without ever establishing a
> > > > connection, then assume the problem may be the server is
> > > > rejecting
> > > > us
> > > > due to port reuse.
> > > 
> > > Doesn't this break 4.0 replay cache? Seems too general to assume
> > > that
> > > any unsuccessful SYN was due to a server reboot and it's ok for
> > > the
> > > client to change the port.
> > 
> > This is where things get interesting. Yes, if we change the port
> > number, then it will almost certainly break NFSv3 and NFSv4.0
> > replay
> > caching on the server.
> > 
> > However the problem is that once we get stuck in the situation
> > where we
> > cannot connect, then each new connection attempt is just causing
> > the
> > server's TCP layer to push back and recall that the connection from
> > this port was closed.
> > IOW: the problem is that once we're in this situation, we cannot
> > easily
> > exit without doing one of the following. Either we have to
> > 
> >    1. Change the port number, so that the TCP layer allows us to
> >       connect.
> >    2. Or.. Wait for long enough that the TCP layer has forgotten
> >       altogether about the previous connection.
> > 
> > The problem is that option (2) is subject to livelock, and so has a
> > potential infinite time out. I've seen this livelock in action, and
> > I'm
> > not seeing a solution that has predictable results.
> > 
> > So unless there is a solution for the problems in (2), I don't see
> > how
> > we can avoid defaulting to option (1) at some point, in which case
> > the
> > only question is "when do we switch ports?".
> 
> I'm not sure how one can justify that regression that will come out
> of
> #1 will be less of a problem then the problem in #2.
> 
> I think I'm still not grasping why the NFS server would
> (legitimately)
> be closing a connection that is re-using the port. Can you present a
> sequence of events that would lead to this?
> 

Yes. It is essentially the problem described in this blog:
https://blog.davidvassallo.me/2010/07/13/time_wait-and-port-reuse/

...and as you can see, it is nothing to do with NFS. This is the TCP
protocol working as expected.

> But can't we at least arm ourselves in not unnecessarily breaking the
> reply cache by at least imposing some timeout/number of retries
> before
> resetting? If the client was retrying to unsuccessfully re-establish
> connection for a (fixed) while, then 4.0 client's lease would expire
> and switching the port after the lease expires makes no difference.
> There isn't a solution in v3 unfortunately. But a time-based approach
> would at least separate these 'peculiar' servers vs normal servers.
> And if this is a 4.1 client, we can reset the port without a timeout.
> 

This is not a 'peculiar server' vs 'normal server' problem. The reuse
of ports in this way violates the TCP protocol, and has been a problem
for NFS/TCP since the beginning. However, it was never a problem for
the older connectionless UDP protocol, which is where the practice of
tying the replay cache to the source port began in the first place.

NFSv4.1 does not have this problem because it deliberately does not
reuse TCP ports, and the reason is precisely to avoid the TIME_WAIT
state problems.

NFSv3 tries to avoid it by doing an incremental back off, but we
recently saw that does not suffice to avoid live lock, after a system
got stuck for several hours in this state.

> Am I correct that every unsuccessful SYN causes a new source point to
> be taken? If so, then a server reboot where multiple SYNs are sent
> prior to connection re-establishment (times number of mounts) might
> cause source port exhaustion?
> 

No. Not every unsuccessful SYN: It is every unsuccessful sequence of
SYNs. If the server is not replying to our SYN packets, then the TCP
layer will back off and retransmit. So there is already a backoff-retry
happening at that level.

> 
> > 
> > > 
> > > > 
> > > > Signed-off-by: Trond Myklebust
> > > > <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > ---
> > > >  net/sunrpc/xprtsock.c | 10 +++++++++-
> > > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > > index 71848ab90d13..1a96777f0ed5 100644
> > > > --- a/net/sunrpc/xprtsock.c
> > > > +++ b/net/sunrpc/xprtsock.c
> > > > @@ -62,6 +62,7 @@
> > > >  #include "sunrpc.h"
> > > > 
> > > >  static void xs_close(struct rpc_xprt *xprt);
> > > > +static void xs_reset_srcport(struct sock_xprt *transport);
> > > >  static void xs_set_srcport(struct sock_xprt *transport, struct
> > > > socket *sock);
> > > >  static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt,
> > > >                 struct socket *sock);
> > > > @@ -1565,8 +1566,10 @@ static void xs_tcp_state_change(struct
> > > > sock
> > > > *sk)
> > > >                 break;
> > > >         case TCP_CLOSE:
> > > >                 if (test_and_clear_bit(XPRT_SOCK_CONNECTING,
> > > > -                                       &transport-
> > > > >sock_state))
> > > > +                                      &transport->sock_state))
> > > > {
> > > > +                       xs_reset_srcport(transport);
> > > >                         xprt_clear_connecting(xprt);
> > > > +               }
> > > >                 clear_bit(XPRT_CLOSING, &xprt->state);
> > > >                 /* Trigger the socket release */
> > > >                 xs_run_error_worker(transport,
> > > > XPRT_SOCK_WAKE_DISCONNECT);
> > > > @@ -1722,6 +1725,11 @@ static void xs_set_port(struct rpc_xprt
> > > > *xprt, unsigned short port)
> > > >         xs_update_peer_port(xprt);
> > > >  }
> > > > 
> > > > +static void xs_reset_srcport(struct sock_xprt *transport)
> > > > +{
> > > > +       transport->srcport = 0;
> > > > +}
> > > > +
> > > >  static void xs_set_srcport(struct sock_xprt *transport, struct
> > > > socket *sock)
> > > >  {
> > > >         if (transport->srcport == 0 && transport-
> > > > >xprt.reuseport)
> > > > --
> > > > 2.41.0
> > > > 
> > 
> > --
> > Trond Myklebust Linux NFS client maintainer, Hammerspace
> > trond.myklebust@xxxxxxxxxxxxxxx

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux