Re: [PATCH 6/6] RPC: adjust timeout for connect, bind, restablish so that they sensitive to the major time out value

"Batsakis, Alexandros" <Alexandros.Batsakis@xxxxxxxxxx> · Fri, 5 Feb 2010 15:04:03 -0800

On Feb 5, 2010, at 14:47, "Chuck Lever" <chuck.lever@xxxxxxxxxx> wrote:

On 02/05/2010 05:14 PM, Batsakis, Alexandros wrote:
Yeah sure,

So imagine that for a specific connection the remaining major timeo
value is 30secs. Xs_connect has a default timeout before attempting  
to
reconnect of 60secs. The user (NFS) expects to "hear back" from the  
rpc
layer within the timeout as in often cases e.g. lease renewal, it's  
of
no benefit for an operation to reach the server at a later time and  
miss
the critical time because it was sleeping for an arbitrary amount  
of time.

My patch ensures that the reconnection attempts happen within the
user-specified limits by replacing the 60secs default timeout with a
value that is less than the time to expire. If this is not ideal the
user should readjust the timeout value.

Does it make sense?

That's better, but this explanation needs to accompany the patch, as  
the long description.  It's hard for reviewers who are not familiar  
with the constraints on NFSv4 RENEW to understand what you're trying  
to fix. Same comment applies to the other patches in your series, I  
think.  (And see further comments in the code below).


Yeah you are right about that

You are changing a generic part of the RPC client to fix an issue  
with one specific NFSv4 procedure, it seems to me.  Did you actually  
observe a situation where the reconnect delay caused the server to  
miss a RENEW request?


Yeah, it's very easy to reproduce too

There are good reasons why there is a timeout before reestablishing  
a connection.  Have you tested this patch with NFSv3 servers that  
are going up and down repeatedly, for example?  I think skipping the  
reconnect delay could have consequences for the cases which the  
original xs_connect logic is supposed to address, and it's not  
something we want in many other cases.


I am not skipping the reconnect delay. What i am saying is that it  
seems wrong to me to hard-code the reconnection delay. Why 60secs for  
example ? To me it seems that this value should be related to the  
timeout. Do you disagree ?

Perhaps a better idea would be to mark these particular RPCs with  
some kind of indication that the RPC client has to connect  
immediately for this request, if possible.  Similar to  
RPC_TASK_SOFTCONN.

In general, sunrpc.ko has a problem with this kind of "urgent" RPC  
request.  If the RPC backlog queue is large for a particular  
rpc_clnt, it can often take many seconds (like longer than the major  
timeout) for a request to actually get down to the transport and get  
sent.  I don't see that these timeout changes necessarily address  
that at all.


Bu if the timeout has expired rpc_execute will quit the task anyways.
What is the downside of instead of sleeping for a long, arbitrary  
period to wake up and poll the server at intervals that actually make  
some sense to the client?

I would suggest that the best solution might be a separate transport  
for such requests, but that has other consequences.  Maybe our RPC  
client needs a separate "urgent" request queue that bypasses the  
normal backlog queue for such time-dependent requests?


A seperate transport for requests that will fail as soon as the  
timeout expires and not later? Because, as I said, the request is  
gonna fail anyways.
I dont't think that this should be treated as a special case, but  
that's just my opinion. Maybe your experience says differently.

I ll address the code comments when we aggree how to proceed.

Thanks!
-alexandros

-alexandros

On Feb 5, 2010, at 12:14, "Chuck Lever" <chuck.lever@xxxxxxxxxx>  
wrote:

Hi Alexandros-

I think we need a little more than the short description for these
patches. Can you explain why the extra logic in xs_connect is
necessary? What exactly happens if the RENEW doesn't reach the  
server
because the transport can't be reconnected?

In other words, what problem are you trying to address here?

On 02/02/2010 07:06 PM, Alexandros Batsakis wrote:
Signed-off-by: Alexandros Batsakis<batsakis@xxxxxxxxxx>
---
net/sunrpc/clnt.c | 2 +-
net/sunrpc/sunrpc.h | 2 ++
net/sunrpc/xprt.c | 18 +++++++++++++++---
net/sunrpc/xprtsock.c | 12 ++++++++++--
4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 154034b..8e552cd 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1019,7 +1019,7 @@ call_bind(struct rpc_task *task)
task->tk_action = call_connect;
if (!xprt_bound(xprt)) {
task->tk_action = call_bind_status;
- task->tk_timeout = xprt->bind_timeout;
+ xprt_set_timeout_sane(task, xprt->bind_timeout);
xprt->ops->rpcbind(task);
}
}
diff --git a/net/sunrpc/sunrpc.h b/net/sunrpc/sunrpc.h
index 90c292e..87c7aaa 100644
--- a/net/sunrpc/sunrpc.h
+++ b/net/sunrpc/sunrpc.h
@@ -37,6 +37,8 @@ struct rpc_buffer {
char data[];
};

+void xprt_set_timeout_sane(struct rpc_task *task, unsigned long
timeout);
+
static inline int rpc_reply_expected(struct rpc_task *task)
{
return (task->tk_msg.rpc_proc != NULL)&&
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 469de29..f6f3988 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -482,7 +482,7 @@ void xprt_wait_for_buffer_space(struct rpc_task
*task, rpc_action action)
struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;

- task->tk_timeout = req->rq_timeout;
+ xprt_set_timeout_sane(task, req->rq_timeout);
rpc_sleep_on(&xprt->pending, task, action);
}
EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space);
@@ -518,7 +518,7 @@ EXPORT_SYMBOL_GPL(xprt_write_space);
*/
void xprt_set_retrans_timeout_def(struct rpc_task *task)
{
- task->tk_timeout = task->tk_rqstp->rq_timeout;
+ xprt_set_timeout_sane(task, task->tk_rqstp->rq_timeout);
}
EXPORT_SYMBOL_GPL(xprt_set_retrans_timeout_def);

@@ -682,6 +682,18 @@ out_abort:
spin_unlock(&xprt->transport_lock);
}

+void xprt_set_timeout_sane(struct rpc_task *task, unsigned long
timeout)

xprt_set_timeout_sane? That's amusing, but it isn't very descriptive
to someone who doesn't know why this extra function is needed.

+{
+ unsigned long majorto = task->tk_rqstp->rq_majortimeo;
+
+ if (majorto<= jiffies)
+ task->tk_timeout = 5 * HZ;
+ else if (timeout> majorto - jiffies)
+ task->tk_timeout = 2 * (majorto - jiffies) / 3;
+ else
+ task->tk_timeout = timeout;
+}
+
/**
* xprt_connect - schedule a transport connect operation
* @task: RPC task that is requesting the connect
@@ -710,7 +722,7 @@ void xprt_connect(struct rpc_task *task)
if (task->tk_rqstp)
task->tk_rqstp->rq_bytes_sent = 0;

- task->tk_timeout = xprt->connect_timeout;
+ xprt_set_timeout_sane(task, xprt->connect_timeout);
rpc_sleep_on(&xprt->pending, task, xprt_connect_status);
xprt->stat.connect_start = jiffies;
xprt->ops->connect(task);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 721bafd..7fcc97f 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2020,9 +2020,17 @@ static void xs_connect(struct rpc_task  
*task)
return;

if (transport->sock != NULL&& !RPC_IS_SOFTCONN(task)) {
+ unsigned long majorto = task->tk_rqstp->rq_majortimeo;
+
+ if (majorto<= jiffies)
+ xprt->reestablish_timeout = 5 * HZ;
+ else if (xprt->reestablish_timeout> majorto - jiffies)
+ xprt->reestablish_timeout = 2 * (majorto - jiffies)
+ / 3;

This looks like the code you added above in _sane. Can you reuse  
that
code here?

+
dprintk("RPC: xs_connect delayed xprt %p for %lu "
- "seconds\n",
- xprt, xprt->reestablish_timeout / HZ);
+ "seconds\n",
+ xprt, xprt->reestablish_timeout + (HZ - 1) / HZ);
queue_delayed_work(rpciod_workqueue,
&transport->connect_worker,
xprt->reestablish_timeout);


--
chuck[dot]lever[at]oracle[dot]com


--
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html