Re: [pynfs PATCH 1/4] nfs4.1: add some reboot tests

"'J. Bruce Fields'" <bfields@xxxxxxxxxxxx> · Mon, 18 Mar 2019 10:57:29 -0400

On Mon, Mar 18, 2019 at 07:30:20AM -0700, Frank Filz wrote:
> > On Thu, Mar 14, 2019 at 05:12:07PM -0400, Scott Mayhew wrote:
> > > +def testRebootWithManyManyManyClients(t, env):
> > > +    """Reboot with many many many clients
> > > +
> > > +    FLAGS: reboot
> > > +    CODE: REBT2c
> > > +    """
> > > +    return doTestRebootWithNClients(t, env, 1000)
> > 
> > My test server uses a 15 second lease time, mainly just to speed up tests.
> That's
> > not enough for pynfs to send out reclaims for 1000 clients.
> > 
> > So I'm wondering whether that's a reasonable test or not.
> > 
> > On the one hand, we should be able to handle 1000 clients, and a 15 second
> > lease is probably unrealistically short.  And maybe we could choose more
> patient
> > behavior for the server (currently it will wait at most 2 grace periods
> while
> > reclaims continue to arrive).
> > 
> > On the other hand, real clients will send their reclaims simultaneously
> rather
> > than one at a time.  And from a trace it looks like most of the time's
> spent
> > waiting for pynfs to send the next request rather than waiting for
> replies.  So this
> > is a bit unusual.
> > 
> > I'm inclined to drop the "many many many clients" tests.  It's easy enough
> for
> > someone doing reboot testing to patch the tests if they need to.
> > 
> > By the way, the longest round trip time I see is the RECLAIM_COMPLETE.
> > I assume that's doing a commit to disk.  It looks like there's nothing on
> the
> > server to prevent processing RECLAIM_COMPLETEs in parallel so as long as
> > that's true I suppose we're OK.
> 
> How about having the many many many clients tests under a different flag so
> they are still available but easy to pick or not pick?

That might be OK.

Or it might also be possible to make the test a little smarter; e.g., if
reclaims start to fail with NOGRACE after a lease period, keep going and
maybe have the test WARN instead of failing.

--b.

> Considering that CID5 with the huge number of client-ids it creates but
> doesn't clean up (so they all eventually expire) has caught bugs in Ganesha,
> I like the idea of messy big tests being available for QE to run...