Re: [Ksummit-discuss] RFC: create mailing list "linux-issues" focussed on issues/bugs and regressions

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Tue, 23 Mar 2021 08:57:25 -0600

On Mon, Mar 22, 2021 at 1:25 PM Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote:
>
>
>
> On 22.03.21 19:32, Linus Torvalds wrote:
> > On Mon, Mar 22, 2021 at 8:18 AM Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote:
> >>
> >>     I even requested a
> >> "linux-regressions@xxxxxxxxxxxxxxx" a while later, but didn't hear
> >> anything back; and, sadly, about the same time I started having trouble
> >> finding spare time for working on regression tracking. :-/
> >
> > Honestly, I'd much prefer the name 'linux-regressions' as being much
> > more targeted than 'linux-issues'.
>
> That only solves one of the two problem I'm trying to solve (albeit the
> one that is more important to me). That way users still have no easy way
> to query for reports about issues that are no regressions – say
> something is broken and they have no idea if it once worked or never
> worked at all.

Without a known baseline of what works OK an issue cannot easily be
categorized as a regression. This "problem" I think deserves its own
considerations.

There are some kernel-ci solutions out there which report "issues"
which help develop such baselines, however what we need is a community
visible list of the sum, a list of *known issues upstream* represents
a baseline. The easiest way to develop such baselines are with
respective tests. We however obviously need to also accept new user
reported issues as possible issues which can be candidate baseline
issues, for which perhaps there are no known tests yet available to
reproduce.

Then there are the considerations also that some distribution issues,
which can be part of a distribution baseline, might fit into the
circle of upstream known issues, or baseline as well. But not all
issues part of a distribution baseline are part of the upstream
baseline, an example is a botched backport. Most distribution
baselines however tend to be private, however I'd like to see that
changed using OpenSUSE/SLE as an example in order to help with the
upstream baseline effort. I'd like to encourage other distributions to
follow suit.

Test frameworks help develop a baseline and so working on them helps
reduce the scope of this problem. We have many test frameworks. What I
have not seen is a public generic baseline "list" for each of these. I
have spent a bit of time on this problem and have come up with a
generic format for issues on test frameworks as part of kdevops [0] in
the hopes that it could be used to easily grep for known issues
against upstream kernels / distribution releases. The format is
simple:

mcgrof@bicho ~/kdevops (git::master)$ cat
workflows/blktests/expunges/5.12.0-rc1-next-20210304/failures.txt
block/009 # korg#212305 failure rate 1/669
block/011
block/012

The korg#212305 refers to bugzilla.kernel.org bug ID #212305 [0].

Distribution issues:

mcgrof@bicho ~/kdevops (git::master)$ cat
workflows/blktests/expunges/debian/testing/failures.txt
block/011
block/012
meta/005
meta/006
meta/009
nbd/002
nbd/003 # causes a hang after running a few times
scsi/004

I have support for blktests and fstests, will add selftests soon. I
tend to work on debian baseline as a public demo for work. The
OpenSUSE Leap 15.3 baseline will be reflective of the real SLE15.3
baseline.

The nice thing about having a public baseline is we can then really be
confident into labelling a new issue that comes up as a possible
regression. However, confidence is subjective, and so one must also
define confidence clearly. You associate confidence to a baseline by
the number of full tests you have run against a baseline for a
respective test framework. Borrowing IO stabilizing terms, I'm using a
test "steady state goal" for this, it means how many times have you
run all possible tests against a known baseline without failure. So a
steady state of 100 for blktests means your confidence in the baseline
you have developed is of 100 full tests. A higher steady state goal
however means more time is required to test, and so sometimes you
might be confined to only use a low steady state goal, but then use
side workers to run random tests with a higher test count. So for
instance, the failure rate of the issue reported on korg#212305 is
defined by the average number of times one must run a test in order
for it to fail. If your baseline steady state goal was just 100,
chances are low you may have run into that issue.

Are there other known collections of public baselines easily grep'able
for different test frameworks? Where can we contribute and collaborate
to such a thing?

PS. My current goal for steady state goal for upstream is 1000 for
blktests, 100 for fstests per filesystem.

[0] https://github.com/mcgrof/kdevops
[1] https://bugzilla.kernel.org/show_bug.cgi?id=212305

  Luis