> On Thu, Mar 14, 2019 at 05:12:07PM -0400, Scott Mayhew wrote: > > +def testRebootWithManyManyManyClients(t, env): > > + """Reboot with many many many clients > > + > > + FLAGS: reboot > > + CODE: REBT2c > > + """ > > + return doTestRebootWithNClients(t, env, 1000) > > My test server uses a 15 second lease time, mainly just to speed up tests. That's > not enough for pynfs to send out reclaims for 1000 clients. > > So I'm wondering whether that's a reasonable test or not. > > On the one hand, we should be able to handle 1000 clients, and a 15 second > lease is probably unrealistically short. And maybe we could choose more patient > behavior for the server (currently it will wait at most 2 grace periods while > reclaims continue to arrive). > > On the other hand, real clients will send their reclaims simultaneously rather > than one at a time. And from a trace it looks like most of the time's spent > waiting for pynfs to send the next request rather than waiting for replies. So this > is a bit unusual. > > I'm inclined to drop the "many many many clients" tests. It's easy enough for > someone doing reboot testing to patch the tests if they need to. > > By the way, the longest round trip time I see is the RECLAIM_COMPLETE. > I assume that's doing a commit to disk. It looks like there's nothing on the > server to prevent processing RECLAIM_COMPLETEs in parallel so as long as > that's true I suppose we're OK. How about having the many many many clients tests under a different flag so they are still available but easy to pick or not pick? Considering that CID5 with the huge number of client-ids it creates but doesn't clean up (so they all eventually expire) has caught bugs in Ganesha, I like the idea of messy big tests being available for QE to run... Frank