Re: [PATCH] NFSv4: fix stateid refreshing when CLOSE racing with OPEN

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4 Sep 2020, at 10:14, Chuck Lever wrote:

On Sep 4, 2020, at 6:55 AM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:

On 3 Sep 2020, at 23:04, Murphy Zhou wrote:

Hi Benjamin,

On Thu, Sep 03, 2020 at 01:54:26PM -0400, Benjamin Coddington wrote:

On 11 Oct 2019, at 10:14, Trond Myklebust wrote:
On Fri, 2019-10-11 at 16:49 +0800, Murphy Zhou wrote:
On Thu, Oct 10, 2019 at 02:46:40PM +0000, Trond Myklebust wrote:
On Thu, 2019-10-10 at 15:40 +0800, Murphy Zhou wrote:
...
@@ -3367,14 +3368,16 @@ static bool
nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
			break;
		}
		seqid_open = state->open_stateid.seqid;
-		if (read_seqretry(&state->seqlock, seq))
-			continue;

		dst_seqid = be32_to_cpu(dst->seqid);
-		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
+		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) > 0)
			dst->seqid = cpu_to_be32(dst_seqid + 1);

This negates the whole intention of the patch you reference in the 'Fixes:', which was to allow us to CLOSE files even if seqid bumps
have
been lost due to interrupted RPC calls e.g. when using 'soft' or
'softerr' mounts.
With the above change, the check could just be tossed out
altogether,
because dst_seqid will never become larger than seqid_open.

Hmm.. I got it wrong. Thanks for the explanation.

So to be clear: I'm not saying that what you describe is not a problem.
I'm just saying that the fix you propose is really no better than
reverting the entire patch. I'd prefer not to do that, and would rather see us look for ways to fix both problems, but if we can't find such as
fix then that would be the better solution.

Hi Trond and Murphy Zhou,

Sorry to resurrect this old thread, but I'm wondering if any progress was
made on this front.

This failure stoped showing up since v5.6-rc1 release cycle
in my records. Can you reproduce this on latest upstream kernel?

I'm seeing it on generic/168 on a v5.8 client against a v5.3 knfsd server. When I test against v5.8 server, the test takes longer to complete and I
have yet to reproduce the livelock.

- on v5.3 server takes ~50 iterations to produce, each test completes in ~40
seconds
- on v5.8 server my test has run ~750 iterations without getting into
the lock, each test takes ~60 seconds.

I suspect recent changes to the server have changed the timing of open
replies such that the problem isn't reproduced on the client.

The Linux NFS server in v5.4 does behave differently than earlier
kernels with NFSv4.0, and it is performance-related. The filecache
went into v5.4, and that seems to change the frequency at which
the server offers delegations.

Just a point of reference - finally reproduced it on a v5.8 server after
4900 runs.  This took several days, and helped to heat the basement.

Ben




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux