[PATCH v1 5/6] docs: 6.Followthrough.rst: more specific advice on fixing regressions

Thorsten Leemhuis <linux@xxxxxxxxxxxxx> · Tue, 10 Dec 2024 11:15:14 +0100

Provide something more concrete about fixing regressions in a few
places, as telling people to "expedite" fixing those that reached a
release deemed for end users is pretty vague. But every situation is
different, so use the soft phrases like "aim for" and leave loopholes.

This removes equivalent paragraphs from a section in
Documentation/process/handling-regressions.rst, which will become mostly
obsolete through this and follow-up changes.

Signed-off-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx>
---

Note, the bits that remain like the "past year" aspect are based on
statements from Linus. Other parts are made up and need to be discussed,
especially the "past six weeks" thing followed by "aim to mainline a fix
before the end of the week after the next" and the "[…] taking one more
week is fine." shortly after.

Those targets are derived from experiences during regression tracking
and the text currently in Documentation/process/handling-regressions.rst
that was ACKed by Greg -- just like an earlier version which had even
shorter time spans. They came into being from calculating backwards how
long users would be exposed to regressions, as then you need to include
"someone with the right set of skills needs to notice, bisect and report
the problem to the right developer", which often will take at least two
to four days days; you furthermore need to take into account how long it
takes for the fix to reach regular users through a new stable release,
which usually happen three to six days after it made it to a new
mainline -rc release. That's why users might be exposed to a regression
for three to four weeks in total, even if it's fixed after a bit more
than two weeks in mainline. That "after a new mainline-rc release" is
also one reason why the text uses the phrase "before the end of the
week", as regression fixes that are mainlined on Monday might just miss
a stable -rc, which often will delay pickup by a whole week (and this
mean ~10+ days to reach users).
---
 Documentation/process/6.Followthrough.rst     | 30 ++++++++++----
 .../process/handling-regressions.rst          | 40 -------------------
 2 files changed, 23 insertions(+), 47 deletions(-)

diff --git a/Documentation/process/6.Followthrough.rst b/Documentation/process/6.Followthrough.rst
index 2ba16a71aba9b4..587e80578f83a9 100644
--- a/Documentation/process/6.Followthrough.rst
+++ b/Documentation/process/6.Followthrough.rst
@@ -198,16 +198,32 @@ maintainers and other developers will take note if you fail to handle regression
 appropriately, especially if they then have to fix the problem themselves: this
 could well make it harder for you to incorporate future changes.
 
-On timing:
+On timing once the mainline change causing the regression became known:
 
- - Expedite fixing regressions that reached releases deemed for end users
-   through new mainline releases or stable backports during the past year.
+ - If the regression is severe or reported by many people within a short
+   timeframe, aim to mainline a fix within two or three days and ideally before
+   the end of the week.
 
- - If the culprit was mainlined during the current development cycle and not
-   backported to stable, fix the regression before -rc6.
+ - Expedite fixing regressions that recently reached releases deemed for end
+   users through new mainline releases or stable backports.  If the culprit
+   reached it in the past six weeks, aim to mainline a fix before the end of the
+   week after the next; if it landed during the past year, taking one more week
+   is fine.  Whenever possible, try to resolve the issue faster -- but it's also
+   okay to take more time if there are strong reasons and a revert no option.
 
- - If a proper regression fix is unlikely to become ready in a reasonable
-   timeframe, resolve the regression by reverting the culprit.  This is
+ - If the culprit was mainlined during the current development cycle and not
+   backported to stable, fix the regression before -rc6. But try to resolve it
+   faster whenever possible -- especially if the issue is either reported
+   multiple times or prevents CI systems or multiple users from testing, as that
+   might mask other bugs and drive testers away.
+
+ - Try your best to mainline all regressions fixes before the current
+   development cycle ends, unless the culprit was committed more than a year
+   ago: then it is acceptable to queue a fix for the next merge window, which
+   is even advisable in case the change bears bigger risks.
+
+ - If mainlining a proper fix within the timeframes outlined above looks
+   unlikely, resolve the regression by reverting the culprit.  This is
    considered an reputable approach, as it allows working out and merging an
    improved variant of the change calmly.
 
diff --git a/Documentation/process/handling-regressions.rst b/Documentation/process/handling-regressions.rst
index da53e12fc6d96c..581f99675a9d52 100644
--- a/Documentation/process/handling-regressions.rst
+++ b/Documentation/process/handling-regressions.rst
@@ -156,46 +156,6 @@ only these options:
 How to realize that in practice depends on various factors. Use the rules of
 thumb outlined in Documentation/process/6.Followthrough.rst as a guide.
 
-In general:
-
- * Prioritize work on regressions over all other Linux kernel work, unless the
-   latter concerns a severe issue (e.g. acute security vulnerability, data loss,
-   bricked hardware, ...).
-
- * Do not consider regressions from the current cycle as something that can wait
-   till the end of the cycle, as the issue might discourage or prevent users and
-   CI systems from testing mainline now or generally.
-
- * Work with the required care to avoid additional or bigger damage, even if
-   resolving an issue then might take longer than outlined below.
-
-On timing once the culprit of a regression is known:
-
- * Aim to mainline a fix within two or three days, if the issue is severe or
-   bothering many users -- either in general or in prevalent conditions like a
-   particular hardware environment, distribution, or stable/longterm series.
-
- * Aim to mainline a fix by Sunday after the next, if the culprit made it
-   into a recent mainline, stable, or longterm release (either directly or via
-   backport); if the culprit became known early during a week and is simple to
-   resolve, try to mainline the fix within the same week.
-
- * For other regressions, aim to mainline fixes before the hindmost Sunday
-   within the next three weeks. One or two Sundays later are acceptable, if the
-   regression is something people can live with easily for a while -- like a
-   mild performance regression.
-
- * It's strongly discouraged to delay mainlining regression fixes till the next
-   merge window, except when the fix is extraordinarily risky or when the
-   culprit was mainlined more than a year ago.
-
-On procedure:
-
- * Try to resolve any regressions introduced in mainline during the past
-   twelve months before the current development cycle ends: Linus wants such
-   regressions to be handled like those from the current cycle, unless fixing
-   bears unusual risks.
-
 On patch flow:
 
  * Developers, when trying to reach the time periods mentioned above, remember
-- 
2.45.0