Re: generic/650 makes v6.0-rc client unusable

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]




> On Sep 4, 2022, at 12:02 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
> 
> Hi-
> 
>> On Sep 4, 2022, at 9:15 AM, Zorro Lang <zlang@xxxxxxxxxx> wrote:
>> 
>> On Sat, Sep 03, 2022 at 06:43:29PM +0000, Chuck Lever III wrote:
>>> While investigating some of the other issues that have been
>>> reported lately, I've found that my v6.0-rc3 NFS/TCP client
>>> goes off the rails often (but not always) during generic/650.
>>> 
>>> This is the test that runs a workload while offlining and
>>> onlining CPUs. My test client has 12 physical cores.
>>> 
>>> The test appears to start normally, but then after a bit
>>> the NFS server workload drops to zero and the NFS mount
>>> disappears. I can't run programs (sudo, for example) on
>>> the client. Can't log in, even on the console. The console
>>> has a constant stream of "can't rotate log: Input/Output
>>> error" type messages.
>>> 
>>> I haven't looked further into this yet. Actually I'm not
>>> quite sure where to start looking.
>>> 
>>> I recently switched this client from a local /home to an
>>> NFS-mounted one, and that's where the xfstests are built
>>> and run from, fwiw.
>> 
>> If most of users complain generic/650, I'd like to exclude g/650 from the
>> "auto" default run group. Any more points?
> 
> Well generic/650 was passing for me before v6.0-rc, and IMO
> it is a tough but reasonable test, considering the ubiquitous
> use of workqueues and other scheduling primitives in our
> filesystems.
> 
> So I think I caught a real bug, but I need a couple more days
> to work it out before deciding generic/650 is throwing false
> negatives and is thus not worth running in the "auto" group.

Following up. I can't reproduce it any more. I've heard more
than one report that this failure can happen on non-NFS
configurations. I'd therefore conclude that I haven't caught
a bug in something I'm actively testing.

Carry on!


> I can't really say whether Ted's failing tests are the
> result of an interaction with the GCE platform or the test
> itself. Ie, his patch might be the right approach -- exclude
> it based on the test platform.

--
Chuck Lever







[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux