On Fri, 21 Jul 2017, Daniel Vetter <daniel@xxxxxxxx> wrote: > On Thu, Jul 20, 2017 at 6:23 PM, Martin Peres > <martin.peres@xxxxxxxxxxxxxxx> wrote: >> Hi everyone, >> >> As some of you may already know, we have made great strides in making our CI >> system usable, especially in the last 6 months when everything started >> clicking together. >> >> The CI team is no longer overwhelmed with fires and bug reports, so we >> started working on increasing the coverage from just fast-feedback, to a >> bigger set of IGT tests. >> >> As some of you may know, running IGT has been a challenge that few manage to >> overcome. Not only is the execution time counted in machine months, but it >> can also lead to disk corruption, which does not encourage developers to run >> it either. One test takes 21 days, on its own, and it is a subset of another >> test which we never ran for obvious reasons. >> >> I would thus like to get the CI team and developers to work together to >> decrease sharply the execution time of IGT, and get these tests run multiple >> times per day! >> >> There are three usages that the CI team envision (up for debate): >> - Basic acceptance testing: Meant for developers and CI to check quickly if >> a patch series is not completely breaking the world (< 10 minutes, timeout >> per test of 30s) >> - Full run: Meant to be ran overnight by developers and users (< 6 hours) >> - Stress tests: They can be in the test suite as a way to catch rare >> issues, but they cannot be part of the default run mode. They likely should >> be run on a case-by-case basis, on demand of a developer. Each test could be >> allowed to take up to 1h. >> >> There are multiple ways of getting to this situation (up for debate): >> >> 1) All the tests exposed by default are fast and meant to be run: >> - Fast-feedback is provided by a testlist, for BAT >> - Stress tests ran using a special command, kept for on-demand testing >> >> 2) Tests are all tagged with information about their exec time: >> - igt@basic@.*: Meant for BAT >> - igt@complete@.*: Meant for FULL >> - igt@stress@.*: The stress tests Ugh. I don't want any scheme that relies on modifying or renaming the tests themselves to categorize them. IMO the names of tests should only be informative. Categorization should be external to that. >> >> 3) Testlists all the way: >> - fast-feedback: for BAT >> - all: the tests that people are expected to run (CI will run them) >> - Stress tests will not be part of any testlist. >> >> Whatever decision is being accepted, the CI team is mandating global >> timeouts for both BAT and FULL testing, in order to guarantee throughput. >> This will require the team as a whole to agree on time quotas per >> sub-systems, and enforce them. >> >> Can we try to get some healthy debate and reach a consensus on this? Our CI >> efforts are being limited by this issue right now, and we will be doing >> whatever we can until the test suite becomes saner and runnable, but this >> may be unfair to some developers. >> >> Looking forward to some constructive feedback and intelligent discussions! >> Martin > > Imo the critical bit for the full run (which should regression test > all features while being fast enough that we can use it for pre-merge > testing) must be the default set. Default here means what you get > without any special cmdline options (to either the test or piglit), > and without any special testlist that are separately maintained. > Default also means that it will be included by default if you do a new > testcase. There's two reasons for that: > > - Maintaining a separate test list is a pain. Also, it encourages > adding tons of tests that no one runs. > > - If tests aren't run by default we can't test them pre-merging before > they land in igt and wreak havoc. I agree the goal should be to run all tests by default. And this means we should start being more critical of the tests we add. For stress tests I would like to look more into splitting up the tests in a way that you could run one iteration fast (as part of default), and repeat the tests for more stress and coverage. I don't know how feasible this is, and if it requires carrying over state from one iteration to other, but I like the goal of running also some of this by default. This would better catch silly bugs in tests too. (We discussed this offline with Martin and Tomi.) > Second, we must have a reasonable runtime, and reasonable runtime here > means a few hours of machine time for everything, total. There's two > reasons for that: > - Only pre-merge is early enough to catch regressions. We can lament > all day long, but fact is that post-merge regressions don't get fixed > or handled in a timely manner, except when they're really serious. > This means any testing strategy that depends upon lots of post-merge > testing, or expects such post-merge testing to work, is bound to fail. > Either we can test everything pre-merge, or there's no regression > testing at all. It's rarely as black and white as you make it out to be, but it's easy to agree pre-merge is the thing that really motivates people to figure stuff out because it blocks their patch from being merged. > - We can't mix together multiple patch series bisect autobisecting is > too unreliable. I've been promised an autobisector for 3 years by > about 4 different teams now, making that happen in a reliable way is > _really_ hard. Blocking CI on this is not reasonable. > > Also, the testsuite really should be fast enough that developers can > run it locally on their machines in a work day. Current plan is that > we can only test on HSW for now, until more budget appears (again, we > can lament about this, but it's not going to change), which means > developers _must_ be able to run stuff on e.g. SKL in a reasonable > amount of time. > > Right now we have a runtime of the gem|prime tests of around 24 days, > and estimated 10 months for the stress tests included. I think the > actual machine time we'll have available in the near future, on this > HSW farm is going to allow 2-3h for gem tests. That's the time budget > for this default set of regression tests. > > Wrt actually implementing it: I don't care, as long as it fulfills the > above. So tagging, per-test comdline options, outright deleting all > the tests we can't run anyway, disabling them in the build system or > whatever else is all fine with me, as long as the default set doesn't > require any special action. For tags this would mean that untagged > tests are _all_ included. Off-topic, but IMO test lists are just an implementation of tags. In that sense, we already have tagging. BR, Jani. -- Jani Nikula, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx