On 10 Nov 2016, at 10:58, Benjamin Coddington wrote:
Hi Anna,
On 10 Nov 2016, at 10:01, Anna Schumaker wrote:
Do you have an estimate for when this patch will be ready? I want to
include it in my next bugfix pull request for 4.9.
I haven't posted because I am still trying to get to the bottom of
another
problem where the client gets stuck in a loop sending the same stateid
over
and over on NFS4ERR_OLD_STATEID. I want to make sure this problem
isn't
caused by this fix -- which I don't think it is, but I'd rather make
sure.
If I don't make any progress on this problem by the end of today, I'll
post
what I have.
Read on if interested in this new problem:
It looks like racing opens with the same openowner can be returned out
of
order by the server, so the client sees stateid seqid of 2 before 1.
Then a
LOCK sent with seqid 1 is endlessly retried if sent while doing
recovery.
It's hard to tell if I was able to capture all the moving parts to
describe
this problem, though. As it takes a very long time for me to
reproduce, and
the packet captures were dropping frames. I'm working on manually
reproducing it now.
Anna,
I haven't gotten to the bottom of it, and so I'm not confident it isn't
a
problem created by the fix I've been testing, which is:
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e809498..2aa9d86 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2564,12 +2564,15 @@ static void
nfs41_check_delegation_stateid(struct
nfs4_state *state)
static int nfs41_check_expired_locks(struct nfs4_state *state)
{
int status, ret = NFS_OK;
- struct nfs4_lock_state *lsp;
+ struct nfs4_lock_state *lsp, *tmp;
struct nfs_server *server = NFS_SERVER(state->inode);
if (!test_bit(LK_STATE_IN_USE, &state->flags))
goto out;
- list_for_each_entry(lsp, &state->lock_states, ls_locks) {
+ spin_lock(&state->state_lock);
+ list_for_each_entry_safe(lsp, tmp, &state->lock_states,
ls_locks) {
+ atomic_inc(&lsp->ls_count);
+ spin_unlock(&state->state_lock);
if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) {
struct rpc_cred *cred =
lsp->ls_state->owner->so_cred;
@@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct
nfs4_state *state)
break;
}
}
- };
+ nfs4_put_lock_state(lsp);
+ spin_lock(&state->state_lock);
+ }
+ spin_unlock(&state->state_lock);
out:
return ret;
}
http://people.redhat.com/bcodding/old_stateid_loop is tshark output of
my
only good wirecapture of the problem. Without this patch, generic/089
crashes long before this problem is reproduced, so I am stuck figuring
it
out, I'm afraid. Don't wait on my account.
I plan on trying a bit more to reproduce tomorrow, and if I cannot, I'll
write about it under separate cover.
Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html