> On Sep 4, 2022, at 12:02 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > Hi- > >> On Sep 4, 2022, at 9:15 AM, Zorro Lang <zlang@xxxxxxxxxx> wrote: >> >> On Sat, Sep 03, 2022 at 06:43:29PM +0000, Chuck Lever III wrote: >>> While investigating some of the other issues that have been >>> reported lately, I've found that my v6.0-rc3 NFS/TCP client >>> goes off the rails often (but not always) during generic/650. >>> >>> This is the test that runs a workload while offlining and >>> onlining CPUs. My test client has 12 physical cores. >>> >>> The test appears to start normally, but then after a bit >>> the NFS server workload drops to zero and the NFS mount >>> disappears. I can't run programs (sudo, for example) on >>> the client. Can't log in, even on the console. The console >>> has a constant stream of "can't rotate log: Input/Output >>> error" type messages. >>> >>> I haven't looked further into this yet. Actually I'm not >>> quite sure where to start looking. >>> >>> I recently switched this client from a local /home to an >>> NFS-mounted one, and that's where the xfstests are built >>> and run from, fwiw. >> >> If most of users complain generic/650, I'd like to exclude g/650 from the >> "auto" default run group. Any more points? > > Well generic/650 was passing for me before v6.0-rc, and IMO > it is a tough but reasonable test, considering the ubiquitous > use of workqueues and other scheduling primitives in our > filesystems. > > So I think I caught a real bug, but I need a couple more days > to work it out before deciding generic/650 is throwing false > negatives and is thus not worth running in the "auto" group. Following up. I can't reproduce it any more. I've heard more than one report that this failure can happen on non-NFS configurations. I'd therefore conclude that I haven't caught a bug in something I'm actively testing. Carry on! > I can't really say whether Ted's failing tests are the > result of an interaction with the GCE platform or the test > itself. Ie, his patch might be the right approach -- exclude > it based on the test platform. -- Chuck Lever