On Mon, Mar 22, 2021 at 1:25 PM Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote: > > > > On 22.03.21 19:32, Linus Torvalds wrote: > > On Mon, Mar 22, 2021 at 8:18 AM Thorsten Leemhuis <linux@xxxxxxxxxxxxx> wrote: > >> > >> I even requested a > >> "linux-regressions@xxxxxxxxxxxxxxx" a while later, but didn't hear > >> anything back; and, sadly, about the same time I started having trouble > >> finding spare time for working on regression tracking. :-/ > > > > Honestly, I'd much prefer the name 'linux-regressions' as being much > > more targeted than 'linux-issues'. > > That only solves one of the two problem I'm trying to solve (albeit the > one that is more important to me). That way users still have no easy way > to query for reports about issues that are no regressions – say > something is broken and they have no idea if it once worked or never > worked at all. Without a known baseline of what works OK an issue cannot easily be categorized as a regression. This "problem" I think deserves its own considerations. There are some kernel-ci solutions out there which report "issues" which help develop such baselines, however what we need is a community visible list of the sum, a list of *known issues upstream* represents a baseline. The easiest way to develop such baselines are with respective tests. We however obviously need to also accept new user reported issues as possible issues which can be candidate baseline issues, for which perhaps there are no known tests yet available to reproduce. Then there are the considerations also that some distribution issues, which can be part of a distribution baseline, might fit into the circle of upstream known issues, or baseline as well. But not all issues part of a distribution baseline are part of the upstream baseline, an example is a botched backport. Most distribution baselines however tend to be private, however I'd like to see that changed using OpenSUSE/SLE as an example in order to help with the upstream baseline effort. I'd like to encourage other distributions to follow suit. Test frameworks help develop a baseline and so working on them helps reduce the scope of this problem. We have many test frameworks. What I have not seen is a public generic baseline "list" for each of these. I have spent a bit of time on this problem and have come up with a generic format for issues on test frameworks as part of kdevops [0] in the hopes that it could be used to easily grep for known issues against upstream kernels / distribution releases. The format is simple: mcgrof@bicho ~/kdevops (git::master)$ cat workflows/blktests/expunges/5.12.0-rc1-next-20210304/failures.txt block/009 # korg#212305 failure rate 1/669 block/011 block/012 The korg#212305 refers to bugzilla.kernel.org bug ID #212305 [0]. Distribution issues: mcgrof@bicho ~/kdevops (git::master)$ cat workflows/blktests/expunges/debian/testing/failures.txt block/011 block/012 meta/005 meta/006 meta/009 nbd/002 nbd/003 # causes a hang after running a few times scsi/004 I have support for blktests and fstests, will add selftests soon. I tend to work on debian baseline as a public demo for work. The OpenSUSE Leap 15.3 baseline will be reflective of the real SLE15.3 baseline. The nice thing about having a public baseline is we can then really be confident into labelling a new issue that comes up as a possible regression. However, confidence is subjective, and so one must also define confidence clearly. You associate confidence to a baseline by the number of full tests you have run against a baseline for a respective test framework. Borrowing IO stabilizing terms, I'm using a test "steady state goal" for this, it means how many times have you run all possible tests against a known baseline without failure. So a steady state of 100 for blktests means your confidence in the baseline you have developed is of 100 full tests. A higher steady state goal however means more time is required to test, and so sometimes you might be confined to only use a low steady state goal, but then use side workers to run random tests with a higher test count. So for instance, the failure rate of the issue reported on korg#212305 is defined by the average number of times one must run a test in order for it to fail. If your baseline steady state goal was just 100, chances are low you may have run into that issue. Are there other known collections of public baselines easily grep'able for different test frameworks? Where can we contribute and collaborate to such a thing? PS. My current goal for steady state goal for upstream is 1000 for blktests, 100 for fstests per filesystem. [0] https://github.com/mcgrof/kdevops [1] https://bugzilla.kernel.org/show_bug.cgi?id=212305 Luis