Re: [PATCH] NFS: add a sysctl for disable the reconnect delay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/13/2010 06:25 AM, Mi Jinlong wrote:
Hi Chuck,

   Sorry for replying your message so later.

Chuck Lever 写道:
Hi Mi-

On 03/18/2010 06:11 AM, Mi Jinlong wrote:
If network partition or some other reason cause a reconnect, it cannot
succeed immediately when environment recover, but client want to connect
timely sometimes.

This patch can provide a proc
file(/proc/sys/fs/nfs/nfs_disable_reconnect_delay)
to allow client disable the reconnect delay(reestablish_timeout) when
using NFS.

It's only useful for NFS.

There's a good reason for the connection re-establishment delay, and
only very few instances where you'd want to disable it.  A sysctl is the
wrong place for this, as it would disable the reconnect delay across the
board, instead of for just those occasions when it is actually necessary
to connect immediately.

   Yes, I agree with you.


I assume that because the grace period has a time limit, you would want
the client to reconnect at all costs?  I think that this is actually
when a client should take care not to spuriously reconnect: during a
server reboot, a server may be sluggish or not completely ready to
accept client requests.  It's not a time when a client should be
showering a server with connection attempts.

The reconnect delay is an exponential backoff that starts at 3 seconds,
so if the server is really ready to accept connections, the actual
connection delay ought to be quick.

We're already considering shortening the maximum amount of time the
client can wait before trying a reconnect.  And, it might possibly be
that the network layer itself is interfering with the backoff logic that
is already built into the RPC client.  (If true, that would be the real
bug in this case).  I'm not interested in a workaround when we really
should fix any underlying issues to make this work correctly.

Perhaps the RPC client needs to distinguish between connection refusal
(where a lengthening exponential backoff between connection attempts
makes sense) and no server response (where we want the client's network
layer to keep sending SYN requests so that it can reconnect as soon as
possible).

   When reading the kernel's code and testing, I find there are three case:

   A. network partition:
      Becasue the client can't communicate with server's rpcbind,
      so there is no influence.

   B. server's nfs service stop:
      The client call xprt_connect to conncet, but get err(111: Connection refused).

   C. server's nfs service sotp, and ifdown the NIC after about 60s:
      At first, when the NIC is up, xprt_connect get err(111: Connection refused) as 2.

      After NIC is down, xprt_connect get err(113: No route to host).

  When connecting fail, the sunrpc level only get a ETIMEDOUT or EAGAIN err, it will also
  call xprt_connect to reconnect.
  If we make the network layer to keep sending SYN requests, but there will be more request
  be delayed at the request queue, and the reestablish_timeout also be increased.

  Can we distinguish those refusal at sunrpc level, but not at xprt level ?
  If we can do that, the problem will solved easily.

  [NOTE]
    the testing process:
          client                    server
    1.   mount nfs (OK)
    2.     df (OK)
    3.                             nfs stop
    4.     df (hang)

   I get message through rpcdebug.

We have a matrix of cases. "soft" v. "hard" RPCs, ECONNREFUSED v. no response, connection previously closed by server disconnect v. client idle timeout.

I've found at least one major bug in this logic, and that is that the 60 second transport connect timer is clobbered in the ECONNREFUSED case, so soft RPCs never time out if the server refuses a connection, for example. I handed all of this off to Trond.

The second scenario might disable the reconnect timer so that only one
->connect() call would be outstanding until the network layer tells us
it's given up on SYN retries.

   I think that's a good idea, but implementation may be a great work.

thanks,
Mi Jinlong



--
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux