On Mon, Nov 29, 2021 at 09:13:16AM -0800, dai.ngo@xxxxxxxxxx wrote: > Hi Bruce, > > On 11/21/21 7:04 PM, dai.ngo@xxxxxxxxxx wrote: > > > >On 11/17/21 4:34 PM, J. Bruce Fields wrote: > >>On Wed, Nov 17, 2021 at 01:46:02PM -0800, dai.ngo@xxxxxxxxxx wrote: > >>>On 11/17/21 9:59 AM, dai.ngo@xxxxxxxxxx wrote: > >>>>On 11/17/21 6:14 AM, J. Bruce Fields wrote: > >>>>>On Tue, Nov 16, 2021 at 03:06:32PM -0800, dai.ngo@xxxxxxxxxx wrote: > >>>>>>Just a reminder that this patch is still waiting for your review. > >>>>>Yeah, I was procrastinating and hoping yo'ud figure out the pynfs > >>>>>failure for me.... > >>>>Last time I ran 4.0 OPEN18 test by itself and it passed. I will run > >>>>all OPEN tests together with 5.15-rc7 to see if the problem you've > >>>>seen still there. > >>>I ran all tests in nfsv4.1 and nfsv4.0 with courteous and non-courteous > >>>5.15-rc7 server. > >>> > >>>Nfs4.1 results are the same for both courteous and > >>>non-courteous server: > >>>>Of those: 0 Skipped, 0 Failed, 0 Warned, 169 Passed > >>>Results of nfs4.0 with non-courteous server: > >>>>Of those: 8 Skipped, 1 Failed, 0 Warned, 577 Passed > >>>test failed: LOCK24 > >>> > >>>Results of nfs4.0 with courteous server: > >>>>Of those: 8 Skipped, 3 Failed, 0 Warned, 575 Passed > >>>tests failed: LOCK24, OPEN18, OPEN30 > >>> > >>>OPEN18 and OPEN30 test pass if each is run by itself. > >>Could well be a bug in the tests, I don't know. > > > >The reason OPEN18 failed was because the test timed out waiting for > >the reply of an OPEN call. The RPC connection used for the test was > >configured with 15 secs timeout. Note that OPEN18 only fails when > >the tests were run with 'all' option, this test passes if it's run > >by itself. > > > >With courteous server, by the time OPEN18 runs, there are about 1026 > >courtesy 4.0 clients on the server and all of these clients have opened > >the same file X with WRITE access. These clients were created by the > >previous tests. After each test completed, since 4.0 does not have > >session, the client states are not cleaned up immediately on the > >server and are allowed to become courtesy clients. > > > >When OPEN18 runs (about 20 minutes after the 1st test started), it > >sends OPEN of file X with OPEN4_SHARE_DENY_WRITE which causes the > >server to check for conflicts with courtesy clients. The loop that > >checks 1026 courtesy clients for share/access conflict took less > >than 1 sec. But it took about 55 secs, on my VM, for the server > >to expire all 1026 courtesy clients. > > > >I modified pynfs to configure the 4.0 RPC connection with 60 seconds > >timeout and OPEN18 now consistently passed. The 4.0 test results are > >now the same for courteous and non-courteous server: > > > >8 Skipped, 1 Failed, 0 Warned, 577 Passed > > > >Note that 4.1 tests do not suffer this timeout problem because the > >4.1 clients and sessions are destroyed after each test completes. > > Do you want me to send the patch to increase the timeout for pynfs? > or is there any other things you think we should do? I don't know. 55 seconds to clean up 1026 clients is about 50ms per client, which is pretty slow. I wonder why. I guess it's probably updating the stable storage information. Is /var/lib/nfs/ on your server backed by a hard drive or an SSD or something else? I wonder if that's an argument for limiting the number of courtesy clients. --b.