Re: [PATCH] SUNRPC: Don't retry using the same source port if connection failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 29, 2023 at 10:57 PM Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:
>
> On Thu, 2023-09-28 at 10:58 -0400, Olga Kornievskaia wrote:
> > On Wed, Sep 27, 2023 at 3:35 PM <trondmy@xxxxxxxxxx> wrote:
> > >
> > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > >
> > > If the TCP connection attempt fails without ever establishing a
> > > connection, then assume the problem may be the server is rejecting
> > > us
> > > due to port reuse.
> >
> > Doesn't this break 4.0 replay cache? Seems too general to assume that
> > any unsuccessful SYN was due to a server reboot and it's ok for the
> > client to change the port.
>
> This is where things get interesting. Yes, if we change the port
> number, then it will almost certainly break NFSv3 and NFSv4.0 replay
> caching on the server.
>
> However the problem is that once we get stuck in the situation where we
> cannot connect, then each new connection attempt is just causing the
> server's TCP layer to push back and recall that the connection from
> this port was closed.
> IOW: the problem is that once we're in this situation, we cannot easily
> exit without doing one of the following. Either we have to
>
>    1. Change the port number, so that the TCP layer allows us to
>       connect.
>    2. Or.. Wait for long enough that the TCP layer has forgotten
>       altogether about the previous connection.
>
> The problem is that option (2) is subject to livelock, and so has a
> potential infinite time out. I've seen this livelock in action, and I'm
> not seeing a solution that has predictable results.
>
> So unless there is a solution for the problems in (2), I don't see how
> we can avoid defaulting to option (1) at some point, in which case the
> only question is "when do we switch ports?".

I'm not sure how one can justify that regression that will come out of
#1 will be less of a problem then the problem in #2.

I think I'm still not grasping why the NFS server would (legitimately)
be closing a connection that is re-using the port. Can you present a
sequence of events that would lead to this?

But can't we at least arm ourselves in not unnecessarily breaking the
reply cache by at least imposing some timeout/number of retries before
resetting? If the client was retrying to unsuccessfully re-establish
connection for a (fixed) while, then 4.0 client's lease would expire
and switching the port after the lease expires makes no difference.
There isn't a solution in v3 unfortunately. But a time-based approach
would at least separate these 'peculiar' servers vs normal servers.
And if this is a 4.1 client, we can reset the port without a timeout.

Am I correct that every unsuccessful SYN causes a new source point to
be taken? If so, then a server reboot where multiple SYNs are sent
prior to connection re-establishment (times number of mounts) might
cause source port exhaustion?


>
> >
> > >
> > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > ---
> > >  net/sunrpc/xprtsock.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > index 71848ab90d13..1a96777f0ed5 100644
> > > --- a/net/sunrpc/xprtsock.c
> > > +++ b/net/sunrpc/xprtsock.c
> > > @@ -62,6 +62,7 @@
> > >  #include "sunrpc.h"
> > >
> > >  static void xs_close(struct rpc_xprt *xprt);
> > > +static void xs_reset_srcport(struct sock_xprt *transport);
> > >  static void xs_set_srcport(struct sock_xprt *transport, struct
> > > socket *sock);
> > >  static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt,
> > >                 struct socket *sock);
> > > @@ -1565,8 +1566,10 @@ static void xs_tcp_state_change(struct sock
> > > *sk)
> > >                 break;
> > >         case TCP_CLOSE:
> > >                 if (test_and_clear_bit(XPRT_SOCK_CONNECTING,
> > > -                                       &transport->sock_state))
> > > +                                      &transport->sock_state)) {
> > > +                       xs_reset_srcport(transport);
> > >                         xprt_clear_connecting(xprt);
> > > +               }
> > >                 clear_bit(XPRT_CLOSING, &xprt->state);
> > >                 /* Trigger the socket release */
> > >                 xs_run_error_worker(transport,
> > > XPRT_SOCK_WAKE_DISCONNECT);
> > > @@ -1722,6 +1725,11 @@ static void xs_set_port(struct rpc_xprt
> > > *xprt, unsigned short port)
> > >         xs_update_peer_port(xprt);
> > >  }
> > >
> > > +static void xs_reset_srcport(struct sock_xprt *transport)
> > > +{
> > > +       transport->srcport = 0;
> > > +}
> > > +
> > >  static void xs_set_srcport(struct sock_xprt *transport, struct
> > > socket *sock)
> > >  {
> > >         if (transport->srcport == 0 && transport->xprt.reuseport)
> > > --
> > > 2.41.0
> > >
>
> --
> Trond Myklebust Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux