Re: [PATCH 1/1] NFSv4.1 fix a kswap nfs4_state_manger race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Nov 25, 2013, at 1:28 PM, "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx>
 wrote:

> 
> On Nov 25, 2013, at 13:17, Adamson, Andy <William.Adamson@xxxxxxxxxx> wrote:
> 
>> 
>> On Nov 25, 2013, at 1:13 PM, "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx>
>> wrote:
>> 
>>> 
>>> On Nov 25, 2013, at 12:57, <andros@xxxxxxxxxx> <andros@xxxxxxxxxx> wrote:
>>> 
>>>> From: Andy Adamson <andros@xxxxxxxxxx>
>>>> 
>>>> The state manager is recovering expired state and recovery OPENs are being
>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur
>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
>>>> resultant layoutreturn gets an error that the state mangager is to handle,
>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
>>>> 
>>>> At the same time an open is waiting for the inode deletion to complete in
>>>> __wait_on_freeing_inode.
>>>> 
>>>> If the open is either the open called by the state manager, or an open from
>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the
>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
>>>> then the state is deadlocked with kswapd.
>>>> 
>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode.
>>> 
>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost.
>>> 
>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases?
>> 
>> Yeah, I was thinking about this as well - perhaps recovering from session-level errors or grace/delay errors would be useful for the block client.
> 
> NFS4ERR_DELAY, probably, yes.
> 
> NFS4ERR_GRACE, no… That’s a reboot situation
> 
> As for session level errors, I’d say that complicates things too much, since several of those can basically end up masking a NFS4ERR_STALE_CLIENTID error.
> 
> 
> Either way, all the layout types (including blocks) should be able to continue on even if we miss a layout return or two. The server has to be coded to expect a forgetful client.

OK - I'll resend the patch.

-->Andy
> 
> --
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@xxxxxxxxxx
> www.netapp.com
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux