Re: [PATCH igt] lib: Check and report if a subtest triggers a new kernel taint

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Szwichtenberg, Radoslaw (2017-11-29 13:14:52)
> On Wed, 2017-11-29 at 12:40 +0000, Chris Wilson wrote:
> > Quoting Chris Wilson (2017-11-29 12:30:23)
> > > Checking for a tainted kernel is a convenient way to see if the test
> > > generated a critical error such as a oops, or machine check.
> > > 
> > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> > > Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@xxxxxxxxx>
> > > ---
> > > diff --git a/lib/igt_kernel_taint.c b/lib/igt_kernel_taint.c
> > > new file mode 100644
> > > index 00000000..86d9cd20
> > > --- /dev/null
> > > +++ b/lib/igt_kernel_taint.c
> > > @@ -0,0 +1,95 @@
> > > +/*
> > > + * Copyright 2017 Intel Corporation
> > > + *
> > > + * Permission is hereby granted, free of charge, to any person obtaining a
> > > + * copy of this software and associated documentation files (the
> > > "Software"),
> > > + * to deal in the Software without restriction, including without
> > > limitation
> > > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > > + * and/or sell copies of the Software, and to permit persons to whom the
> > > + * Software is furnished to do so, subject to the following conditions:
> > > + *
> > > + * The above copyright notice and this permission notice (including the
> > > next
> > > + * paragraph) shall be included in all copies or substantial portions of
> > > the
> > > + * Software.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> > > OR
> > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> > > OTHER
> > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > > DEALINGS
> > > + * IN THE SOFTWARE.
> > > + */
> > > +
> > > +#include <unistd.h>
> > > +#include <fcntl.h>
> > > +
> > > +#include "igt.h"
> > > +#include "igt_kernel_taint.h"
> > > +#include "igt_sysfs.h"
> > > +
> > > +#define BIT(x) (1ul << (x))
> > > +
> > > +static const struct kernel_taint {
> > > +       const char *msg;
> > > +       unsigned int flags;
> > > +} taints[] = {
> > > +       { "Non-GPL module loaded" },
> > > +       { "Forced module load" },
> > > +       { "Unsafe SMP processor" },
> > > +       { "Forced module unload" },
> > > +       { "Machine Check Exception", TAINT_WARN },
> > > +       { "Bad page detected", TAINT_ERROR },
> > > +       { "Tainted by user request", TAINT_WARN },
> > 
> > Since unsafe modparams generate these and we are still using them
> > extensively, we should probably ignore this one.
> > 
> > > +       { "System is on fire", TAINT_ERROR },
> > > +       { "ACPI DSDT has been overridden by user" },
> > > +       { "OOPS", TAINT_ERROR },
> > > +       { "Staging driver loaded; are you mad?" },
> > > +       { "Severe firmware bug workaround active", TAINT_WARN },
> > > +       { "Out-of-tree module loaded" },
> > > +       { "Unsigned module loaded" },
> > > +       { "Soft-lockup detected", TAINT_WARN },
> > > +       { "Kernel has been live patched" },
> > > +};
> > > +
> > > +unsigned long igt_read_kernel_taint(void)
> > 
> > One thing I haven't checked is whether we can clear the kernel taints.
> > At the moment, once we see an oops, we never report a second test
> > generating another oops.
> > -Chris
> 
> I guess that clearing kernel taints is not needed when you hit oops - you
> probably should stop executing tests and reboot the machine, right?

Oops in the driver tends to stop igt pretty hard. A good rule of thumb
is indeed to abandon all hope and reboot. I'm thinking that with this
sort of early-warning detection in place, we can use the kernel_taint
when we do detect a persistent error, e.g. abandon the run if one flip
times out, or if we fail to park or reset the GPU. All to make that
catastrophic error stand out and not pollute other test results.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux