On Apr 14, 2010, at 6:30 AM, Mi Jinlong wrote: > Chuck Lever 写道: >> On 04/13/2010 06:25 AM, Mi Jinlong wrote: >>> Hi Chuck, >>> >>> Sorry for replying your message so later. >>> >>> Chuck Lever 写道: >>>> Hi Mi- >>>> >>>> On 03/18/2010 06:11 AM, Mi Jinlong wrote: >>>>> If network partition or some other reason cause a reconnect, it cannot >>>>> succeed immediately when environment recover, but client want to >>>>> connect >>>>> timely sometimes. >>>>> >>>>> This patch can provide a proc >>>>> file(/proc/sys/fs/nfs/nfs_disable_reconnect_delay) >>>>> to allow client disable the reconnect delay(reestablish_timeout) when >>>>> using NFS. >>>>> >>>>> It's only useful for NFS. >>>> >>>> There's a good reason for the connection re-establishment delay, and >>>> only very few instances where you'd want to disable it. A sysctl is the >>>> wrong place for this, as it would disable the reconnect delay across the >>>> board, instead of for just those occasions when it is actually necessary >>>> to connect immediately. >>> >>> Yes, I agree with you. >>> >>>> >>>> I assume that because the grace period has a time limit, you would want >>>> the client to reconnect at all costs? I think that this is actually >>>> when a client should take care not to spuriously reconnect: during a >>>> server reboot, a server may be sluggish or not completely ready to >>>> accept client requests. It's not a time when a client should be >>>> showering a server with connection attempts. >>>> >>>> The reconnect delay is an exponential backoff that starts at 3 seconds, >>>> so if the server is really ready to accept connections, the actual >>>> connection delay ought to be quick. >>>> >>>> We're already considering shortening the maximum amount of time the >>>> client can wait before trying a reconnect. And, it might possibly be >>>> that the network layer itself is interfering with the backoff logic that >>>> is already built into the RPC client. (If true, that would be the real >>>> bug in this case). I'm not interested in a workaround when we really >>>> should fix any underlying issues to make this work correctly. >>>> >>>> Perhaps the RPC client needs to distinguish between connection refusal >>>> (where a lengthening exponential backoff between connection attempts >>>> makes sense) and no server response (where we want the client's network >>>> layer to keep sending SYN requests so that it can reconnect as soon as >>>> possible). >>> >>> When reading the kernel's code and testing, I find there are three >>> case: >>> >>> A. network partition: >>> Becasue the client can't communicate with server's rpcbind, >>> so there is no influence. >>> >>> B. server's nfs service stop: >>> The client call xprt_connect to conncet, but get err(111: >>> Connection refused). >>> >>> C. server's nfs service sotp, and ifdown the NIC after about 60s: >>> At first, when the NIC is up, xprt_connect get err(111: >>> Connection refused) as 2. >>> >>> After NIC is down, xprt_connect get err(113: No route to host). >>> >>> When connecting fail, the sunrpc level only get a ETIMEDOUT or >>> EAGAIN err, it will also >>> call xprt_connect to reconnect. >>> If we make the network layer to keep sending SYN requests, but there >>> will be more request >>> be delayed at the request queue, and the reestablish_timeout also be >>> increased. >>> >>> Can we distinguish those refusal at sunrpc level, but not at xprt >>> level ? > > What do you think that I show yesterday? In xprtsock.c, these reconnection errors are distinguished. In the generic sunrpc client (xprt.c) they are not -- when an RPC transmission is sent, xprtsock.c returns ENOTCONN for any connection error. Trond made this change after 2.6.18. The differences matter in how the client re-establishes the connection, and that logic is all in xprtsock.c. So, the RPC client already makes this distinction, but the logic may have bugs. >>> If we can do that, the problem will solved easily. >>> >>> [NOTE] >>> the testing process: >>> client server >>> 1. mount nfs (OK) >>> 2. df (OK) >>> 3. nfs stop >>> 4. df (hang) >>> >>> I get message through rpcdebug. >> >> We have a matrix of cases. "soft" v. "hard" RPCs, ECONNREFUSED v. no >> response, connection previously closed by server disconnect v. client >> idle timeout. > > connection previously closed by server disconnect v. client idle timeout? > Can you explain to me in some sort? Maybe it's useful for me. Thanks. If the server closed the connection, the client should use the re-establish timeout to delay the reconnection in order to prevent a hard loop of client connection retries. If the client idled the connection out, then the client should reconnect immediately. >> I've found at least one major bug in this logic, and that is that the 60 >> second transport connect timer is clobbered in the ECONNREFUSED case, so >> soft RPCs never time out if the server refuses a connection, for >> example. I handed all of this off to Trond. > > Really? > I mount the nfs file through soft(-o soft), and then I using "df" command > to see the mount information after server's nfs stop. > The "df" will return with error -5(Input/output error), maybe it's RPCs > timeout cause the df return? RPC timeouts generally cause an EIO. However, if the server continues to refuse a connection, the timeout never occurs. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html