Re: NFS TCP race condition with SOCK_ASYNC_NOSPACE

Andrew Cooper <andrew.cooper3@xxxxxxxxxx> · Tue, 22 Nov 2011 12:34:34 +0000

On 22/11/11 12:22, Trond Myklebust wrote:
> On Tue, 2011-11-22 at 12:16 +0000, Andrew Cooper wrote: 
>> On 22/11/11 12:10, Trond Myklebust wrote:
>>> On Tue, 2011-11-22 at 12:02 +0000, Andrew Cooper wrote: 
>>>> On 22/11/11 11:38, Trond Myklebust wrote:
>>>>> On Mon, 2011-11-21 at 18:14 +0000, Andrew Cooper wrote: 
>>>>>> Following some debugging, I believe that the attached patch fixes the
>>>>>> problem.
>>>>>>
>>>>>> Simply returning EAGAIN is not sufficient, as the task does not get
>>>>>> requeued, and times out 13 seconds later (as per our mount options). 
>>>>>> Setting the SOCK_ASYNC_NOSPACE bit causes the requeue to happen.
>>>>>>
>>>>>> I realize that this is a gross hack and I should probably not be using
>>>>>> SOCK_ASYNC_NOSPACE in that way.  Is there a better way to achieve the
>>>>>> same solution?
>>>>>>
>>>>> What you are doing will cause the request to be put to sleep with no
>>>>> guarantee that it will ever be woken up. Why would we want to do that if
>>>>> there is no report of a tcp window/buffer space congestion?
>>>> But the reason we get to this code is because there was a report of
>>>> space collision.  What would you suggest instead?  Changing
>>>> xs_{tcp,udp}_send_request() to retry in this case would defeat the point
>>>> of having xs_nospace().
>>> I suggest doing absolutely nothing: do what you originally proposed,
>>> which is to report the EAGAIN so that the client state machine retries
>>> the socket write.
>>>
>>> My point is that this is a context which is _not_ atomic with the
>>> original report of tcp window/buffer space congestion. There are no
>>> locks or anything else that will guarantee that the congestion still
>>> exists, and the fact that the SOCK_ASYNC_NOSPACE flag is now clear
>>> indicates that this is the case.
>>> The whole purpose of xs_nospace() is to wait until a congestion
>>> condition clears. If the congestion clears before we get here, then we
>>> have no reason to do anything special other than retry.
>>>
>>> Trond
>> I am slightly confused as to what you mean now.
>>
>> When you take out the if(test_bit test and always set ret to EAGAIN and
>> requeue the request, the next time it wakes up is when it is killed due
>> to timeout.  This results in substantially worse effects for the
>> userspace, as the NFS session is killed.
> What is putting the request to sleep? It should be awake when it enters
> xs_nospace(), and nothing in or after that function should be putting it
> to sleep until we've retried with call_transmit().
>

I presume it is the call to xprt_wait_for_buffer_space() which calls
rpc_sleep_on().  There is xs_tcp_write_space which appears to wake it up
based on sk->sk_write_space which is triggered on the sock gaining more
space, which has already happened in this specific case.

Sorry if I am being a bit slow here - I am still learning my way round
an unfamiliar codebase.

>> Did you mean something else when you said "always report EAGAIN"?
> Nope.
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html