Re: [PATCH 1/1] NFSv4.1 fix a kswap nfs4_state_manger race

"Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> · Mon, 25 Nov 2013 20:57:58 +0000

On Nov 25, 2013, at 15:51, Adamson, Andy <William.Adamson@xxxxxxxxxx> wrote:

> 
> On Nov 25, 2013, at 3:29 PM, "Adamson, Andy" <William.Adamson@xxxxxxxxxx>
> wrote:
> 
>> 
>> On Nov 25, 2013, at 3:20 PM, "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx>
>> wrote:
>> 
>>> 
>>> On Nov 25, 2013, at 15:10, Adamson, Andy <William.Adamson@xxxxxxxxxx> wrote:
>>> 
>>>> 
>>>> On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx>
>>>> wrote:
>>>> 
>>>>> 
>>>>> On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@xxxxxxxxxx> wrote:
>>>>> 
>>>>>> 
>>>>>> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx>
>>>>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@xxxxxxxxxx> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On Nov 25, 2013, at 12:57, <andros@xxxxxxxxxx> <andros@xxxxxxxxxx> wrote:
>>>>>>>> 
>>>>>>>>> From: Andy Adamson <andros@xxxxxxxxxx>
>>>>>>>>> 
>>>>>>>>> The state manager is recovering expired state and recovery OPENs are being
>>>>>>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur
>>>>>>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
>>>>>>>>> resultant layoutreturn gets an error that the state mangager is to handle,
>>>>>>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
>>>>>>>>> 
>>>>>>>>> At the same time an open is waiting for the inode deletion to complete in
>>>>>>>>> __wait_on_freeing_inode.
>>>>>>>>> 
>>>>>>>>> If the open is either the open called by the state manager, or an open from
>>>>>>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the
>>>>>>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
>>>>>>>>> then the state is deadlocked with kswapd.
>>>>>>>>> 
>>>>>>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode.
>>>>>>>> 
>>>>>>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost.
>>>>>>>> 
>>>>>>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases?
>>>>>>> 
>>>>>>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode…
>>>>>> 
>>>>>> Yes, good point.  kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running.
>>>>>> 
>>>>>> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case.
>>>>> 
>>>>> Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here.
>>>> 
>>>> Except we just changed that to fix a different state manager hang:
>>>> 
>>>> commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0
>>>> Author: Andy Adamson <andros@xxxxxxxxxx>
>>>> Date:   Fri Nov 15 16:36:16 2013 -0500
>>>> 
>>>> NFSv4 wait on recovery for async session errors
>>> 
>>> Right, but that won’t prevent nfs4_evict_inode from completing,
>> 
>> Ah - I was thinking of the synchronous handlers call to nfs4_wait_clnt_recover - so yes, no problem
> 
> In fact, this issue is NOT an upstream issue!  RHEL6.5-pre has nfs4_proc_layoutreturn as as SYNC rpc call, and _that_ is the bug that is fixed upstream.
> 
> Really sorry for the confusion. I'll back port a solution for RHEL6.5

Are you sure? As far as I can tell, the upstream nfs4_proc_layoutreturn is also synchronous. I therefore suspect that we still have the same problem.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html