On Wed, Feb 13, 2019 at 01:07:54PM -0500, Theodore Y. Ts'o wrote: > > 2) Documenting what are known failures should be for various tests on > different file systems and kernel versions. I think we all have our > own way of excluding tests which are known to fail. One extreme case > is where the test case was added to xfstests (generic/484), but the > patch to fix it got hung up because it was somewhat controversial, so > it was failing on all file systems. > > Other cases might be when fixing a particular test failure is too > complex to backport to stable (maybe because it would drag in all > sorts of other changes in other subsystems), so that test is Just > Going To Fail for a particular stable kernel series. > > It probably doesn't make sense to do this in xfstests, which is why we > all have our own individual test runners that are layered on top of > xfstests. But if we want to automate running xfstests for stable > kernel series, some way of annotating fixes for different kernel > versions would be useful, perhaps some kind of centralized clearing > house of this information would be useful. I think that the first step can be to require the new test to go in "after" the respective kernel fix. And related to that, require the test to include a well-defined tag (preferably both in the test itself and commit description) saying which commit fixed this particular problem. It does not solve all the problems, but would be a huge help. We could also update old tests regularly with new tags as problems are introduced and fixed, but that's a bit more involved. One thing that would help with this would be to tag a kernel commit that fixes a problem for which we already have a tast with the repeoctive test number. Another think I was planning to do since forever was to create a standard machine readble output, the ability to construct a database of the results and present it in the easily browsable format like a set of html with help of js. I never got around to it, but it would be nice to be able to compare historical data, kernel versions, options, or even file systems and identify tests that often fail, or never fail and even how the run time differs. That might also help one to construct fast, quick fail set of tests from ones own historical data. It would open some interesting possibilities. -Lukas