Re: [PATCH v7 13/31] NFSv4.1: Ensure we always run TEST/FREE_STATEID on locks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/10/2016 03:18 PM, Benjamin Coddington wrote:
> 
> On 10 Nov 2016, at 10:58, Benjamin Coddington wrote:
> 
>> Hi Anna,
>>
>> On 10 Nov 2016, at 10:01, Anna Schumaker wrote:
>>> Do you have an estimate for when this patch will be ready?  I want to include it in my next bugfix pull request for 4.9.
>>
>> I haven't posted because I am still trying to get to the bottom of another
>> problem where the client gets stuck in a loop sending the same stateid over
>> and over on NFS4ERR_OLD_STATEID.  I want to make sure this problem isn't
>> caused by this fix -- which I don't think it is, but I'd rather make sure.
>> If I don't make any progress on this problem by the end of today, I'll post
>> what I have.
>>
>> Read on if interested in this new problem:
>>
>> It looks like racing opens with the same openowner can be returned out of
>> order by the server, so the client sees stateid seqid of 2 before 1.  Then a
>> LOCK sent with seqid 1 is endlessly retried if sent while doing recovery.
>>
>> It's hard to tell if I was able to capture all the moving parts to describe
>> this problem, though.  As it takes a very long time for me to reproduce, and
>> the packet captures were dropping frames.  I'm working on manually
>> reproducing it now.
> 
> Anna,
> 
> I haven't gotten to the bottom of it, and so I'm not confident it isn't a
> problem created by the fix I've been testing, which is:
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index e809498..2aa9d86 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -2564,12 +2564,15 @@ static void nfs41_check_delegation_stateid(struct
> nfs4_state *state)
>  static int nfs41_check_expired_locks(struct nfs4_state *state)
>  {
>         int status, ret = NFS_OK;
> -       struct nfs4_lock_state *lsp;
> +       struct nfs4_lock_state *lsp, *tmp;
>         struct nfs_server *server = NFS_SERVER(state->inode);
> 
>         if (!test_bit(LK_STATE_IN_USE, &state->flags))
>                 goto out;
> -       list_for_each_entry(lsp, &state->lock_states, ls_locks) {
> +       spin_lock(&state->state_lock);
> +       list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) {
> +               atomic_inc(&lsp->ls_count);
> +               spin_unlock(&state->state_lock);
>                 if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) {
>                         struct rpc_cred *cred =
> lsp->ls_state->owner->so_cred;
> 
> @@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct
> nfs4_state *state)
>                                 break;
>                         }
>                 }
> -       };
> +               nfs4_put_lock_state(lsp);
> +               spin_lock(&state->state_lock);
> +       }
> +       spin_unlock(&state->state_lock);
>  out:
>         return ret;
>  }
> 
> http://people.redhat.com/bcodding/old_stateid_loop is tshark output of my
> only good wirecapture of the problem.  Without this patch, generic/089
> crashes long before this problem is reproduced, so I am stuck figuring it
> out, I'm afraid.  Don't wait on my account.
> 
> I plan on trying a bit more to reproduce tomorrow, and if I cannot, I'll
> write about it under separate cover.

Sounds good.  Thanks for the update!

Anna

> 
> Ben

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux