Re: [PATCH/RFC] Add simple backoff logic when reconnecting to a server that recently initiated a connection close

Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> · Mon, 3 Mar 2014 18:02:08 -0500

On Mar 3, 2014, at 17:10, Scott Mayhew <smayhew@xxxxxxxxxx> wrote:

> On Mon, 03 Mar 2014, Trond Myklebust wrote:
> 
>> 
>> On Mar 3, 2014, at 11:13, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>> 
>>> On Fri, 28 Feb 2014 17:29:56 -0500
>>> Scott Mayhew <smayhew@xxxxxxxxxx> wrote:
>>> 
>>>> From 2e3902fc0c66bda360a8e40e3e64d82e312a20d4 Mon Sep 17 00:00:00 2001
>>>> From: Scott Mayhew <smayhew@xxxxxxxxxx>
>>>> Date: Fri, 28 Feb 2014 15:23:50 -0500
>>>> Subject: [PATCH] sunrpc: reintroduce xprt->shutdown with a new purpose (option
>>>> 2)
>>>> 
>>>> If a server is behaving pathologically and accepting our connections
>>>> only to close the socket on the first RPC operation it receives, then
>>>> we should probably delay when trying to reconnect.
>>>> 
>>>> This patch reintroduces the xprt->shutdown field (this time as two
>>>> bits).  Previously this field was used to indicate that the transport
>>>> was in the process of being shutdown, but now it will just be used to
>>>> indicate that a shutdown was initiated by the server.
>>>> 
>>>> If the server closes the connection 3 times without us having received
>>>> an RPC reply in the interim, then we'll delay before attempting to
>>>> connect again.
>>>> ---
>>>> include/linux/sunrpc/xprt.h |  3 ++-
>>>> net/sunrpc/clnt.c           |  2 ++
>>>> net/sunrpc/xprtsock.c       | 13 +++++++++++++
>>>> 3 files changed, 17 insertions(+), 1 deletion(-)
>>>> 
>>> 
>>> This patch seems a little more reasonable than the other one if only
>>> because it shouldn't cause artificial delays when there is some
>>> temporary hiccup that causes the server to shut down the connection.
>>> 
>>> That said, this seems to be squarely a server-side bug so I'm not sure
>>> we ought to go to any great lengths to work around it.
>> 
>> So this is about a broken server that accepts connection requests and then immediately closes them?
> 
> That's correct.
> 
>> If so, then I agree with Jeff, it really isn?t something we need to fix on the client.
> 
> Not even for the sake of 'politeness' (for lack of a better word)?  This
> was in a grid environment and there were apparently a few thousand clients
> doing this, not just a single client.
> 

No. If the problem is a broken server, then we fix the server. No politeness needed… :-)

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html