Re: [PATCH v7 13/31] NFSv4.1: Ensure we always run TEST/FREE_STATEID on locks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10 Nov 2016, at 10:58, Benjamin Coddington wrote:

Hi Anna,

On 10 Nov 2016, at 10:01, Anna Schumaker wrote:
Do you have an estimate for when this patch will be ready? I want to include it in my next bugfix pull request for 4.9.

I haven't posted because I am still trying to get to the bottom of another problem where the client gets stuck in a loop sending the same stateid over and over on NFS4ERR_OLD_STATEID. I want to make sure this problem isn't caused by this fix -- which I don't think it is, but I'd rather make sure. If I don't make any progress on this problem by the end of today, I'll post
what I have.

Read on if interested in this new problem:

It looks like racing opens with the same openowner can be returned out of order by the server, so the client sees stateid seqid of 2 before 1. Then a LOCK sent with seqid 1 is endlessly retried if sent while doing recovery.

It's hard to tell if I was able to capture all the moving parts to describe this problem, though. As it takes a very long time for me to reproduce, and
the packet captures were dropping frames.  I'm working on manually
reproducing it now.

Anna,

I haven't gotten to the bottom of it, and so I'm not confident it isn't a
problem created by the fix I've been testing, which is:

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e809498..2aa9d86 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2564,12 +2564,15 @@ static void nfs41_check_delegation_stateid(struct
nfs4_state *state)
 static int nfs41_check_expired_locks(struct nfs4_state *state)
 {
        int status, ret = NFS_OK;
-       struct nfs4_lock_state *lsp;
+       struct nfs4_lock_state *lsp, *tmp;
        struct nfs_server *server = NFS_SERVER(state->inode);

        if (!test_bit(LK_STATE_IN_USE, &state->flags))
                goto out;
-       list_for_each_entry(lsp, &state->lock_states, ls_locks) {
+       spin_lock(&state->state_lock);
+ list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) {
+               atomic_inc(&lsp->ls_count);
+               spin_unlock(&state->state_lock);
                if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) {
                        struct rpc_cred *cred =
lsp->ls_state->owner->so_cred;

@@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct
nfs4_state *state)
                                break;
                        }
                }
-       };
+               nfs4_put_lock_state(lsp);
+               spin_lock(&state->state_lock);
+       }
+       spin_unlock(&state->state_lock);
 out:
        return ret;
 }

http://people.redhat.com/bcodding/old_stateid_loop is tshark output of my
only good wirecapture of the problem.  Without this patch, generic/089
crashes long before this problem is reproduced, so I am stuck figuring it
out, I'm afraid.  Don't wait on my account.

I plan on trying a bit more to reproduce tomorrow, and if I cannot, I'll
write about it under separate cover.

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux