On Fri, Jul 21, 2017 at 11:39 AM, Daniel Vetter <daniel@xxxxxxxx> wrote: > On Thu, Jul 20, 2017 at 6:23 PM, Martin Peres > <martin.peres@xxxxxxxxxxxxxxx> wrote: >> Hi everyone, >> >> As some of you may already know, we have made great strides in making our CI >> system usable, especially in the last 6 months when everything started >> clicking together. >> >> The CI team is no longer overwhelmed with fires and bug reports, so we >> started working on increasing the coverage from just fast-feedback, to a >> bigger set of IGT tests. >> >> As some of you may know, running IGT has been a challenge that few manage to >> overcome. Not only is the execution time counted in machine months, but it >> can also lead to disk corruption, which does not encourage developers to run >> it either. One test takes 21 days, on its own, and it is a subset of another >> test which we never ran for obvious reasons. >> >> I would thus like to get the CI team and developers to work together to >> decrease sharply the execution time of IGT, and get these tests run multiple >> times per day! >> >> There are three usages that the CI team envision (up for debate): >> - Basic acceptance testing: Meant for developers and CI to check quickly if >> a patch series is not completely breaking the world (< 10 minutes, timeout >> per test of 30s) >> - Full run: Meant to be ran overnight by developers and users (< 6 hours) >> - Stress tests: They can be in the test suite as a way to catch rare >> issues, but they cannot be part of the default run mode. They likely should >> be run on a case-by-case basis, on demand of a developer. Each test could be >> allowed to take up to 1h. >> >> There are multiple ways of getting to this situation (up for debate): >> >> 1) All the tests exposed by default are fast and meant to be run: >> - Fast-feedback is provided by a testlist, for BAT >> - Stress tests ran using a special command, kept for on-demand testing >> >> 2) Tests are all tagged with information about their exec time: >> - igt@basic@.*: Meant for BAT >> - igt@complete@.*: Meant for FULL >> - igt@stress@.*: The stress tests >> >> 3) Testlists all the way: >> - fast-feedback: for BAT >> - all: the tests that people are expected to run (CI will run them) >> - Stress tests will not be part of any testlist. >> >> Whatever decision is being accepted, the CI team is mandating global >> timeouts for both BAT and FULL testing, in order to guarantee throughput. >> This will require the team as a whole to agree on time quotas per >> sub-systems, and enforce them. >> >> Can we try to get some healthy debate and reach a consensus on this? Our CI >> efforts are being limited by this issue right now, and we will be doing >> whatever we can until the test suite becomes saner and runnable, but this >> may be unfair to some developers. >> >> Looking forward to some constructive feedback and intelligent discussions! >> Martin > > Imo the critical bit for the full run (which should regression test > all features while being fast enough that we can use it for pre-merge > testing) must be the default set. Default here means what you get > without any special cmdline options (to either the test or piglit), > and without any special testlist that are separately maintained. > Default also means that it will be included by default if you do a new > testcase. There's two reasons for that: > > - Maintaining a separate test list is a pain. Also, it encourages > adding tons of tests that no one runs. > > - If tests aren't run by default we can't test them pre-merging before > they land in igt and wreak havoc. > > Second, we must have a reasonable runtime, and reasonable runtime here > means a few hours of machine time for everything, total. There's two > reasons for that: > - Only pre-merge is early enough to catch regressions. We can lament > all day long, but fact is that post-merge regressions don't get fixed > or handled in a timely manner, except when they're really serious. > This means any testing strategy that depends upon lots of post-merge > testing, or expects such post-merge testing to work, is bound to fail. > Either we can test everything pre-merge, or there's no regression > testing at all. > > - We can't mix together multiple patch series bisect autobisecting is > too unreliable. I've been promised an autobisector for 3 years by > about 4 different teams now, making that happen in a reliable way is > _really_ hard. Blocking CI on this is not reasonable. I forgot one: Randomize test running is also not an acceptable solution for pre-merge testing, because it's guaranteed to make the results more noisy. And also guarantees that we'll miss some important regression tests. > Also, the testsuite really should be fast enough that developers can > run it locally on their machines in a work day. Current plan is that > we can only test on HSW for now, until more budget appears (again, we > can lament about this, but it's not going to change), which means > developers _must_ be able to run stuff on e.g. SKL in a reasonable > amount of time. > > Right now we have a runtime of the gem|prime tests of around 24 days, > and estimated 10 months for the stress tests included. I think the > actual machine time we'll have available in the near future, on this > HSW farm is going to allow 2-3h for gem tests. That's the time budget > for this default set of regression tests. > > Wrt actually implementing it: I don't care, as long as it fulfills the > above. So tagging, per-test comdline options, outright deleting all > the tests we can't run anyway, disabling them in the build system or > whatever else is all fine with me, as long as the default set doesn't > require any special action. For tags this would mean that untagged > tests are _all_ included. Just to make it clear: This are the hard constraints we have. Demanding that we have more machines to run more tests, demanding that we have better post-regression tracking or anything else isn't a constructive approach here. The challenge is to engineer a test suite that fits within the hard constraints, and refusing to do that is simply not proper software engienering. And the current gem/prime regression test suite we do have entirely fails that reality check. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx