Re: NFSv4 mounts take longer the fail from ENETUNREACH than NFSv3 mounts.

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 21 Oct 2010 10:31:57 -0400

On Oct 21, 2010, at 10:05 AM, Trond Myklebust wrote:

> On Thu, 2010-10-21 at 14:25 +1100, Neil Brown wrote:
>> On Wed, 20 Oct 2010 20:45:32 -0400
>> Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>> 
>>> On Thu, 21 Oct 2010 07:40:28 +1100
>>> Neil Brown <neilb@xxxxxxx> wrote:
>>> 
>>>>> 
>>>>> Then what happens is that xs_tcp_send_request gets called again to try
>>>>> to resend the packet. In the EHOSTUNREACH case, that returns
>>>>> EHOSTUNREACH which eventually causes an rpc_exit with that error. In
>>>>> the ENETUNREACH case that returns EPIPE, which makes the state machine
>>>>> move next to call_bind and the whole thing starts over again.
>>>> 
>>>> This confuses me.  Why would  xs_tcp_send_request (aka ->send_request) get
>>>> called before the connect has succeeded?  Can you make sense of that?
>>>> 
>>> 
>>> It confuses me too. I suspect that this may actually be a bug...
>>> 
>>> So EINPROGRESS makes the connect_worker task clear the connecting bit
>>> and return. Eventually, the EHOSTUNREACH error is reported to
>>> xs_error_report. That function does this:
>>> 
>>>        xprt_wake_pending_tasks(xprt, -EAGAIN);
>>> 
>>> The task that was waiting on the connect_worker is then woken up.
>>> call_connect_status does this:
>>> 
>>>        if (status >= 0 || status == -EAGAIN) {
>>>                clnt->cl_stats->netreconn++;
>>>                task->tk_action = call_transmit;
>>>                return;
>>>        }
>>> 
>>> ...and we end up in call_transmit without the socket being connected.
>>> 
>>> So I understand how this happened, but I don't really understand the
>>> design of the connect mechanism well enough to know whether this is
>>> by design or not.
>>> 
>> 
>> Now that *is* interesting.....
>> 
>> I thought that code in call_connect_status was hard to understand too, so I
>> asked git who to blame it on. It said:
>> 
>> commit 2a4919919a97911b0aa4b9f5ac1eab90ba87652b
>> Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
>> Date:   Wed Mar 11 14:38:00 2009 -0400
>> 
>>    SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending
>> 
>>    While we should definitely return socket errors to the task that is
>>    currently trying to send data, there is no need to propagate the same error
>>    to all the other tasks on xprt->pending. Doing so actually slows down
>>    recovery, since it causes more than one tasks to attempt socket recovery.
>> 
>>    Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
>> 
>> 
>> 
>> That commit not only adds the "status == -EAGAIN" test, but also introduced
>> some of the xprtsock.c code that I suggested changing in a previous patch.
> 
> That is because ENOTCONN was trying to report the current state of the
> socket to a bunch of tasks on the 'pending' list. The problem is that
> because none of those tasks hold any form of lock, then by the time they
> get round to processing that ENOTCONN, the state of the socket can (and
> usually will) have changed.
> 
>> So there seem to be some correlation between that commit and the present
>> problem.
>> 
>> I tried compiling the kernel just prior to that commit, and 
>>  mount -t nfs4 unrouteable.ip.addres:/ /mnt
>> 
>> took 3 seconds to time fail.
>> 
>> I then stepped forward to that commit and the same command took 3 *minutes* to
>> time out.  So something isn't right there.  Unfortunately I don't know what.
>> 
>> Trond: can you comment on this - maybe explain the reasoning behind that
>> commit better, and suggest how we can get ENOTCONN to fail SOFTCONN
>> connections faster without undoing the things this patch tried to achieve?
> 
> It seems to me that the problem is basically that RPC_IS_SOFTCONN is
> poorly defined. IMO, the definition should really be that
> 'RPC_IS_SOFTCONN' tasks MUST exit with an error if they ever have to run
> call_bind or a call_connect more than once (i.e. if they ever have to
> loop back).

I think you are suggesting that we define the places where RPC_IS_SOFTCONN should be tested as exactly those places that might want to restart the RPC call via rpc_force_rebind, tk_action = call_refresh, and so on.  Yes?

> With that in mind, I really don't see why the RPC_IS_SOFTCONN case is
> being handled in call_transmit_status() instead of in call_status().

I tried it in call_status(), but it didn't work as expected.  My notes aren't specific about the problem.  One thing is for sure, there seems to be some redundant error handling logic in both of those states.  Unraveling it might be more than Neil bargained for, but would help "future generations."

> I furthermore don't see why ECONNRESET, ENOTCONN and EPIPE should be
> treated any differently from ECONNREFUSED, EHOSTDOWN, EHOSTUNREACH and
> ENETUNREACH.

I can get behind that.  It looks like call_bind_status already combines all these error codes, so it makes some sense.

-- 
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html