[RFC PATCH v1 1/2] docs: reporting-issue: rework the detailed guide

Thorsten Leemhuis <linux@xxxxxxxxxxxxx> · Tue, 26 Mar 2024 13:21:28 +0100

Rework the detailed step-by-step guide for various reasons:

* Simplify the search with the help of lore.kernel.org/all/, which did
  not exist when the text was written.

* Make use of the recently added document
  Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst,
  which covers many steps this text partly covered way better.

* The 'quickly report a stable regression to the stable team' approach
  hardly worked out: most of the time the regression was not known yet.
  Try a different approach using the regressions list.

* Reports about stable/longterm regressions most of the time were
  greeted with a brief reply along the lines of 'Is mainline affected as
  well?'; this is needed to determine who is responsible, so we might as
  well make the reporter check that before sending the report (which
  verify-bugs-and-bisect-regressions.rst already tells them to do, too).

* A lot of fine tuning after seeing what people were struggling with.

FIXME: adjust the entries in the reference section to match these
changes.

Not-signed-off-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx>
---
 .../admin-guide/reporting-issues.rst          | 391 ++++++++++--------
 1 file changed, 210 insertions(+), 181 deletions(-)

diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 2fd5a030235ad0..e6083946c146e8 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -48,187 +48,216 @@ Once the report is out, answer any questions that come up and help where you
 can. That includes keeping the ball rolling by occasionally retesting with newer
 releases and sending a status update afterwards.
 
-Step-by-step guide how to report issues to the kernel maintainers
-=================================================================
-
-The above TL;DR outlines roughly how to report issues to the Linux kernel
-developers. It might be all that's needed for people already familiar with
-reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For
-everyone else there is this section. It is more detailed and uses a
-step-by-step approach. It still tries to be brief for readability and leaves
-out a lot of details; those are described below the step-by-step guide in a
-reference section, which explains each of the steps in more detail.
-
-Note: this section covers a few more aspects than the TL;DR and does things in
-a slightly different order. That's in your interest, to make sure you notice
-early if an issue that looks like a Linux kernel problem is actually caused by
-something else. These steps thus help to ensure the time you invest in this
-process won't feel wasted in the end:
-
- * Are you facing an issue with a Linux kernel a hardware or software vendor
-   provided? Then in almost all cases you are better off to stop reading this
-   document and reporting the issue to your vendor instead, unless you are
-   willing to install the latest Linux version yourself. Be aware the latter
-   will often be needed anyway to hunt down and fix issues.
-
- * Perform a rough search for existing reports with your favorite internet
-   search engine; additionally, check the archives of the `Linux Kernel Mailing
-   List (LKML) <https://lore.kernel.org/lkml/>`_. If you find matching reports,
-   join the discussion instead of sending a new one.
-
- * See if the issue you are dealing with qualifies as regression, security
-   issue, or a really severe problem: those are 'issues of high priority' that
-   need special handling in some steps that are about to follow.
-
- * Make sure it's not the kernel's surroundings that are causing the issue
-   you face.
-
- * Create a fresh backup and put system repair and restore tools at hand.
-
- * Ensure your system does not enhance its kernels by building additional
-   kernel modules on-the-fly, which solutions like DKMS might be doing locally
-   without your knowledge.
-
- * Check if your kernel was 'tainted' when the issue occurred, as the event
-   that made the kernel set this flag might be causing the issue you face.
-
- * Write down coarsely how to reproduce the issue. If you deal with multiple
-   issues at once, create separate notes for each of them and make sure they
-   work independently on a freshly booted system. That's needed, as each issue
-   needs to get reported to the kernel developers separately, unless they are
-   strongly entangled.
-
- * If you are facing a regression within a stable or longterm version line
-   (say something broke when updating from 5.10.4 to 5.10.5), scroll down to
-   'Dealing with regressions within a stable and longterm kernel line'.
-
- * Locate the driver or kernel subsystem that seems to be causing the issue.
-   Find out how and where its developers expect reports. Note: most of the
-   time this won't be bugzilla.kernel.org, as issues typically need to be sent
-   by mail to a maintainer and a public mailing list.
-
- * Search the archives of the bug tracker or mailing list in question
-   thoroughly for reports that might match your issue. If you find anything,
-   join the discussion instead of sending a new report.
-
-After these preparations you'll now enter the main part:
-
- * Unless you are already running the latest 'mainline' Linux kernel, better
-   go and install it for the reporting process. Testing and reporting with
-   the latest 'stable' Linux can be an acceptable alternative in some
-   situations; during the merge window that actually might be even the best
-   approach, but in that development phase it can be an even better idea to
-   suspend your efforts for a few days anyway. Whatever version you choose,
-   ideally use a 'vanilla' build. Ignoring these advices will dramatically
-   increase the risk your report will be rejected or ignored.
-
- * Ensure the kernel you just installed does not 'taint' itself when
-   running.
-
- * Reproduce the issue with the kernel you just installed. If it doesn't show
-   up there, scroll down to the instructions for issues only happening with
-   stable and longterm kernels.
-
- * Optimize your notes: try to find and write the most straightforward way to
-   reproduce your issue. Make sure the end result has all the important
-   details, and at the same time is easy to read and understand for others
-   that hear about it for the first time. And if you learned something in this
-   process, consider searching again for existing reports about the issue.
-
- * If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
-   decoding the kernel log to find the line of code that triggered the error.
-
- * If your problem is a regression, try to narrow down when the issue was
-   introduced as much as possible.
-
- * Start to compile the report by writing a detailed description about the
-   issue. Always mention a few things: the latest kernel version you installed
-   for reproducing, the Linux Distribution used, and your notes on how to
-   reproduce the issue. Ideally, make the kernel's build configuration
-   (.config) and the output from ``dmesg`` available somewhere on the net and
-   link to it. Include or upload all other information that might be relevant,
-   like the output/screenshot of an Oops or the output from ``lspci``. Once
-   you wrote this main part, insert a normal length paragraph on top of it
-   outlining the issue and the impact quickly. On top of this add one sentence
-   that briefly describes the problem and gets people to read on. Now give the
-   thing a descriptive title or subject that yet again is shorter. Then you're
-   ready to send or file the report like the MAINTAINERS file told you, unless
-   you are dealing with one of those 'issues of high priority': they need
-   special care which is explained in 'Special handling for high priority
-   issues' below.
-
- * Wait for reactions and keep the thing rolling until you can accept the
-   outcome in one way or the other. Thus react publicly and in a timely manner
-   to any inquiries. Test proposed fixes. Do proactive testing: retest with at
-   least every first release candidate (RC) of a new mainline version and
-   report your results. Send friendly reminders if things stall. And try to
-   help yourself, if you don't get any help or if it's unsatisfying.
-
-
-Reporting regressions within a stable and longterm kernel line
---------------------------------------------------------------
-
-This subsection is for you, if you followed above process and got sent here at
-the point about regression within a stable or longterm kernel version line. You
-face one of those if something breaks when updating from 5.10.4 to 5.10.5 (a
-switch from 5.9.15 to 5.10.5 does not qualify). The developers want to fix such
-regressions as quickly as possible, hence there is a streamlined process to
-report them:
-
- * Check if the kernel developers still maintain the Linux kernel version
-   line you care about: go to the  `front page of kernel.org
-   <https://kernel.org/>`_ and make sure it mentions
-   the latest release of the particular version line without an '[EOL]' tag.
-
- * Check the archives of the `Linux stable mailing list
-   <https://lore.kernel.org/stable/>`_ for existing reports.
-
- * Install the latest release from the particular version line as a vanilla
-   kernel. Ensure this kernel is not tainted and still shows the problem, as
-   the issue might have already been fixed there. If you first noticed the
-   problem with a vendor kernel, check a vanilla build of the last version
-   known to work performs fine as well.
-
- * Send a short problem report to the Linux stable mailing list
-   (stable@xxxxxxxxxxxxxxx) and CC the Linux regressions mailing list
-   (regressions@xxxxxxxxxxxxxxx); if you suspect the cause in a particular
-   subsystem, CC its maintainer and its mailing list. Roughly describe the
-   issue and ideally explain how to reproduce it. Mention the first version
-   that shows the problem and the last version that's working fine. Then
-   wait for further instructions.
-
-The reference section below explains each of these steps in more detail.
-
-
-Reporting issues only occurring in older kernel version lines
--------------------------------------------------------------
-
-This subsection is for you, if you tried the latest mainline kernel as outlined
-above, but failed to reproduce your issue there; at the same time you want to
-see the issue fixed in a still supported stable or longterm series or vendor
-kernels regularly rebased on those. If that the case, follow these steps:
-
- * Prepare yourself for the possibility that going through the next few steps
-   might not get the issue solved in older releases: the fix might be too big
-   or risky to get backported there.
-
- * Perform the first three steps in the section "Dealing with regressions
-   within a stable and longterm kernel line" above.
-
- * Search the Linux kernel version control system for the change that fixed
-   the issue in mainline, as its commit message might tell you if the fix is
-   scheduled for backporting already. If you don't find anything that way,
-   search the appropriate mailing lists for posts that discuss such an issue
-   or peer-review possible fixes; then check the discussions if the fix was
-   deemed unsuitable for backporting. If backporting was not considered at
-   all, join the newest discussion, asking if it's in the cards.
-
- * One of the former steps should lead to a solution. If that doesn't work
-   out, ask the maintainers for the subsystem that seems to be causing the
-   issue for advice; CC the mailing list for the particular subsystem as well
-   as the stable mailing list.
-
-The reference section below explains each of these steps in more detail.
+The detailed step-by-step guide on reporting Linux kernel issues
+================================================================
+
+The short guide above might be all needed for people already familiar
+with reporting issues to Free/Libre & Open Source Software projects. For
+everyone else there is this more detailed step-by-step guide. It still tries to
+be brief and leaves a lot of details occasionally relevant to a reference
+section, which holds additional information for almost all of the steps.
+
+Note: this step-by-step guide covers more aspects than the short guide above and
+does things in a slightly different order; that is done in the reader's interest,
+to make sure you notice early on when  on the wrong track.
+
+* Be aware you must have or install a fresh vanilla mainline kernel for
+  reporting; you furthermore must remove any software that builds or relies on
+  externally developed kernel modules possibly installed. There is also a decent
+  chance you will have to build a patched kernel yourself to help resolve the
+  issue.
+
+  In case that sounds do demanding to you, better report the issue to the vendor
+  who built your kernel (usually your Linux distributor or hardware manufacturer).
+
+* Skim the output of ``journalctl -k`` for any indicators of problems that might
+  lead to your bug.
+
+* Check if the kernel was already 'tainted' when the issue first occurred: the
+  event that led to this flag being set might cause your issue, even if it looks
+  totally unrelated.
+
+* Consider some glitch in your kernel's environment makes it misbehave -- like
+  a hardware defect, a mis-configured system firmware, an overclocked component,
+  a broken initramfs, an inconsistent file system, broken firmware files,
+  a pre-release compiler, or a malfunctioning/misconfigured Linux distribution.
+
+* If you deal with multiple issues at once, process them separately from now on.
+  If there is even a small chance they are related, briefly mention the other
+  issues in each of the reports later, ideally while linking to the others.
+
+* Search for fixes and earlier reports referring to an issue like yours. Start
+  by checking `lore <https://lore.kernel.org/all/>`_. Then perform a general
+  internet search. Consult :ref:`MAINTAINERS <maintainers>` to determine where
+  developers of the affected code expect bugs to be submitted to; if in a doubt,
+  use your best guess to determine the driver or kernel subsystem. If its
+  developers have a dedicated mailing list not archived on lore, search its
+  archives; when they are among the few that uses one of
+  various bug trackers, search it as well. Note, bugzilla.kernel.org
+  is the right place to file bugs only for a small percentage of the kernel; if
+  you submit bugs for other code there it most likely will be ignored.
+
+  If you find fixes, try them. If you find matching reports, evaluate whatever
+  is wiser: joining the discussion or reporting the problem anew. In the latter
+  case mention and link to the related report you found; after you submit it,
+  add a note to the related report along the lines of 'I have a problem that
+  might be the same or related, for details see <link_to_your_report>'.
+
+* Are you facing a regression? One still occurring with a less than two
+  (ideally: one) weeks old kernel from the affected series? A kernel that is
+  vanilla or close to it? Then send a brief (one or two short paragraphs) email
+  to <regressions@xxxxxxxxxxxxxxx> asking if the problem is known already.
+  Consider proceeding with this guide immediately to confine the problem and
+  report it properly; definitely do so, if you don't receive any helpful
+  answer within three days.
+
+* Evaluate if the issue you are dealing with qualifies as regression, security
+  issue, or a really severe problem: those need special handling in some of the
+  following steps.
+
+* Write down coarsely how to reproduce the issue on a freshly booted system.
+
+* Verify the bug and potentially bisect any regression as described in
+  Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst;
+  alternatively handle the tasks it covers on your own:
+
+  * Verify the bug occurs with an up-to-date kernel. For regressions within a
+    still supported stable or longterm series this means the latest release from
+    that series. In all other cases, this means a mainline release, pre-release, or
+    snapshot ideally less than one week old and two at maximum; the latest release
+    from the newest stable series might work as well, especially if the series
+    is based on a mainline version released in the past two weeks.
+
+  * In case of a regression, consider bisecting it. If it is one within a stable
+    or longterm series, you must verify if current mainline is affected as well.
+
+  * All kernels used for verifying and reporting bugs must be free of externally
+    developed modules (like Nvidia's graphics drivers, OpenZFS, or VirtualBox's
+    host drivers). The kernels also should be built from pristine (aka 'vanilla')
+    Linux sources, but lightly patched might work, too. The kernels furthermore
+    should not be 'tainted' when the issue occurs.
+
+  Note, don't skip this step or take its demands lightheartedly, as there is a
+  decent chance your report otherwise will be ignored or welcomed brusquely.
+
+* If you learned anything new about the bug while following this guide so far,
+  consider searching once more for earlier reports and fixes.
+
+* Were you unable to reproduce a bug with a current mainline kernel you want to
+  see fixed in a stable or longterm series? A bug that is not a regression? Then
+  move over to ‘Resolving non-regressions only occurring in stable or longterm
+  kernels’.
+
+* Optional: if your failure involves a 'panic', 'Oops', 'warning', or 'BUG',
+  ideally decode the included stack trace.
+
+* Prepare the report by writing a detailed description of the issue.
+
+  Always mention the Linux distribution and the kernel version used for the
+  verification; also include your notes on how to reproduce the issue. If your
+  failure involves a 'panic', 'Oops', 'warning', or 'BUG', include a copy or
+  photo of it.
+
+  Most of the time you also want to describe relevant aspects of your
+  environment, like the machine's model name, the relevant hardware components,
+  or the version of related userspace drivers. Often you want to also save the
+  output of ``journalctl -k`` to a file you later attach to your report or
+  upload somewhere and link to.
+
+  If there other aspects about the environment likely are relevant, attach or
+  upload & link detailed information about is as well, like the output from
+  commands as ``lsblk``, ``lspci``, ``lsusb.py`` and
+  ``grep -s '' /sys/class/dmi/id/*``.
+
+  If anything in the attached or linked files is certainly relevant, ensure
+  to copy that part to the body of the report to make it easily accessible.
+  Furthermore make sure to not overload the report with many or huge
+  attachments: developers will ask for additional data when needed.
+
+  Ensure both the subject and the first sentence of the report outlines the core
+  of the problem and gets people interested enough to read on.
+
+  When finished, review and optimize the report once more to make it as
+  straightforward as possible and the core of the problem easy to grasp.
+
+* Submit your report in the appropriate way, which depends on the outcome of the
+  verification:
+
+  * In case you deal with a security issue, follow the instructions in
+    Documentation/process/security-bugs.rst.
+
+  * Are you facing a regression within a stable or longterm kernel series you
+    were unable to reproduce with a fresh mainline kernel? Then report it by
+    email to the stable team while CCing the regressions lists (To:
+    Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>,
+    Sasha Levin <sashal@xxxxxxxxxx>; CC: stable@xxxxxxxxxxxxxxx,
+    regressions@xxxxxxxxxxxxxxx).
+
+  * In all other cases, submit the report as specified in MAINTAINERS. In case
+    of a regression you have to report by mail, CC the regressions list
+    (regressions@xxxxxxxxxxxxxxx); when you know the culprit, also CC everyone
+    in its 'Signed-off-by' chain. In case of a regression you had to file in a
+    bug tracker, write a short heads-up email with a link to the report to the
+    list and everyone that signed the patch off, if the culprit is known.
+
+    Did you send the brief inquiry about a regression mentioned earlier? Then in
+    both of these cases keep it involved: either send your report as a reply to
+    the earlier inquiry while adding relevant recipients or send a quick note
+    with a link to the proper report.
+
+* Wait for reactions and keep the ball rolling until you can accept the outcome
+  in one way or the other. That among others means:
+
+  * React publicly and in a timely manner to any inquiries.
+
+  * Try to quickly test proposed fixes.
+
+  * Perform proactive testing: retest with at least every first release
+    candidate (e.g. -rc1) of a new mainline version and report your findings in
+    a reply to your report.
+
+  * If things stall for more than three or four weeks, check if that happened
+    due to an inadequate report of yours; if not, send a friendly inquiry.
+
+  * Be aware that nobody is obliged to help you, unless it is a recent
+    regression, a security issue, or a really severe problem; hence try to help
+    yourself, if you don't receive any or only unsatisfying help.
+
+Resolving non-regressions only occurring in stable or longterm kernels
+----------------------------------------------------------------------
+
+Are you facing an issue in a still supported stable or longterm series you were
+unable to reproduce with a fresh mainline kernel? An issue that is also not a
+regression and still happens in the series latest release? In that case follow
+these steps:
+
+* Prepare yourself for the possibility that trying to resolve the issue resolved
+  in the affected stable or longterm series might not work out: the fix might be
+  too big or risky to include there.
+
+* Search Linux' mainline Git repository or lore for the change that resolved the
+  issue; when unsuccessful, consider using a bisection to find it. Then check
+  the description of the fix for a 'stable tag', e.g, a line like
+  'Cc: <stable@xxxxxxxxxxxxxxx>':
+
+ * In case there is such a tag the change is already scheduled for backporting.
+   Usually it will be picked up within two or three weeks after being merged to
+   mainline. Note, a version number after the tag might limit backporting to a
+   series that is newer than the one you care for; plans to backport a change
+   sometimes are also discarded. In such cases search lore or contact the
+   involved developers for details, but you likely are out of luck.
+
+ * If there was no stable tag, search the mailing list archives if backporting
+   nevertheless is in the works. If not, search for the review of the fix and
+   check if backporting to stable and longterm kernels is planned or was
+   rejected. If it's neither, send a reply asking the developers if backporting
+   to the series is an option. Note, they might greenlight it, but unwilling to
+   handle the job themselves -- in that case consider testing and submitting the
+   fix and everything it depends on as explained in
+   Documentation/process/stable-kernel-rules.rst.
+
+ In case you have trouble locating the fix or the discussion about it, consider
+ asking the maintainers and developers of the affected subsystem for advice.
 
 
 Reference section: Reporting issues to the kernel maintainers
-- 
2.44.0