On 10/1/20 1:39 AM, Thorsten Leemhuis wrote: > Tell users early in the process to check the taint flag, as that will > prevent them from investing time into a report that might be worthless. > That way users for example will notice that the issue they face is in > fact caused by an add-on kernel module or and Oops that happened > earlier. > > This approach has a downside: users will later have to check the flag > again with the mainline kernel the guide tells them to install. But that > is an acceptable trade-off here, as checking only takes a few seconds > and can easily prevent wasting time in useless testing and debugging. > > Signed-off-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx> > --- > > = RFC = > > Should "disable DKMS" come before this step? But then the backup step right > before that one would need to be moved as well, as disabling DKMS can mix things > up. > --- > Documentation/admin-guide/reporting-bugs.rst | 59 +++++++++++++++++++ > Documentation/admin-guide/tainted-kernels.rst | 2 + > 2 files changed, 61 insertions(+) > > diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst > index 430a0c3ee0ad..61b6592ddf74 100644 > --- a/Documentation/admin-guide/reporting-bugs.rst > +++ b/Documentation/admin-guide/reporting-bugs.rst > @@ -311,6 +311,65 @@ fatal error where the kernels stop itself) with a 'Oops' (a recoverable error), > as the kernel remains running after an 'Oops'. > > > +Check 'taint' flag > +------------------ > + > + *Check if your kernel was 'tainted' when the issue occurred, as the event > + that made the kernel set this flag might be causing the issue you face.* > + > +The kernel marks itself with a 'taint' flag when something happens that might > +lead to follow-up errors that look totally unrelated. The issue you face might > +be such an error if your kernel is tainted. That's why it's in your interest to > +rule this out early before investing more time into this process. This is the > +only reason why this step is here, as this process later will tell you to > +install the latest mainline kernel and check its taint flag, as that's the > +kernel the report will be mainly about. > + > +On a running system is easy to check if the kernel tainted itself: it's not > +tainted if ``cat /proc/sys/kernel/tainted`` returns '0'. Checking that file is > +impossible in some situations, that's why the kernel also mentions the taint situations; > +status when it reports an internal problem (a 'kernel bug'), a recoverable > +error (a 'kernel Oops') or a non-recoverable error before halting operation (a > +'kernel panic'). Look near the top of the error messages printed when one of > +these occurs and search for a line starting with 'CPU:'. It should end with > +'Not tainted' if the kernel was not tainted beforehand; it was tainted if you > +see 'Tainted:' followed by a few spaces and some letters. > + > +If your kernel is tainted study tainted, study > +:ref:`Documentation/admin-guide/tainted-kernels.rst <taintedkernels>` to find > +out why and try to eliminate the reason. Often it's because a recoverable error > +(a 'kernel Oops') occurred and the kernel tainted itself, as the kernel knows > +it might misbehave in strange ways after that point. In that case check your > +kernel or system log and look for a section that starts with this:: > + > + Oops: 0000 [#1] SMP > + > +That's the first Oops since boot-up, as the '#1' between the brackets shows. > +Every Oops and any other problem that happen after that point might be a > +follow-up problem to that first Oops, even if they look totally unrelated. Try > +to rule this out by getting rid of that Oops and reproducing the issue > +afterwards. Sometimes simply restarting will be enough, sometimes a change to > +the configuration followed by a reboot can eliminate the Oops. But don't invest > +too much time into this at this point of the process, as the cause for the Oops > +might already be fixed in the newer Linux kernel version you are going to > +install later in this process. > + > +Quite a few kernels are also tainted because an unsuitable kernel modules was module > +loaded. This for example is the case if you use Nvidias proprietary graphics Nvidia's > +driver, VirtualBox, or other software that installs its own kernel modules: you > +will have to remove these modules and reboot the system, as they might in fact > +be causing the issue you face. You will need to reboot the system and try to reproduce the issue without loading any of these proprietary modules. > + > +The kernel also taints itself when it's loading a module that resists in the resides > +staging tree of the Linux kernel source. That's a special area for code (mostly > +drivers) that does not yet fulfill the normal Linux kernel quality standards. > +When you report an issue with such a module it's obviously okay if the kernel is > +tainted, just make sure the module in question is the only reason for the taint. tainted; > +If the issue happens in an unrelated area reboot and temporary block the module temporarily > +from being loaded by specifying ``foo.blacklist=1`` as kernel parameter (replace > +'foo' with the name of the module in question). > + > + > .. ############################################################################ > .. Temporary marker added while this document is rewritten. Sections above > .. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old. > diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst > index abf804719890..2900f477f42f 100644 > --- a/Documentation/admin-guide/tainted-kernels.rst > +++ b/Documentation/admin-guide/tainted-kernels.rst > @@ -1,3 +1,5 @@ > +.. _taintedkernels: > + > Tainted kernels > --------------- > > -- ~Randy