Since the discussion on this died and I believe that everyone are scared that the dodge suggestion would require someone to do some work. I could Ack the patch if: + igt_skip_on_f(IS_SANDYBRIDGE(intel_get_drm_devid(fd)), + "random incompletes in CI with this test\n"); + Was replaced with an igt_warn /Marta > -----Original Message----- > From: Martin Peres [mailto:martin.peres@xxxxxxxxxxxxxxx] > Sent: Monday, October 23, 2017 12:29 PM > To: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Daniel Vetter: > <daniel.vetter@xxxxxxxx>; Intel Graphics Development <intel- > gfx@xxxxxxxxxxxxxxxxxxxxx>; Latvala, Petri <petri.latvala@xxxxxxxxx> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>; Lofstedt, Marta > <marta.lofstedt@xxxxxxxxx> > Subject: Re: [PATCH i-g-t] tests/gem_eio: Skip in-flight-suspend on snb > > On 20/10/17 12:26, Joonas Lahtinen wrote: > > + Petri > > > > On Thu, 2017-10-19 at 16:29 +0300, Martin Peres wrote: > >> On 19/10/17 12:51, Daniel Vetter wrote: > >>> CI gets upset about it resulting in an incomplete, let's skip it > >>> until that's fixed to avoid havoc in the CI farm. Of course this > >>> should/will be reverted as soon as we have a fix (similar to how we > >>> dealt with the snb-dies-in-blt-hangs issue). > >>> > >>> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > >>> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> Cc: "Lofstedt, Marta" <marta.lofstedt@xxxxxxxxx> > >>> Cc: Martin Peres <martin.peres@xxxxxxxxxxxxxxx> > >>> References: > >>> https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_eio@in-flight-suspe > >>> nd.html > >>> References: https://bugs.freedesktop.org/show_bug.cgi?id=103289 > >>> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > > > > <SNIP> > > > >> So, let's recap the problem here: > >> - Any incomplete in sharded runs mean that the platform is unfit > >> for pre-merge (because any other test after will go from pass to notrun) > >> - We can't fix issues immediately, especially for old platforms > >> > >> This patch is sweeping the test under the rug by using the skip > >> output, which is not only hard to track, it is also misleading. > >> > >> After discussing with Marta, Arek and Petri, we found some consensus > >> on the following proposal (terminology is up for debate): > >> > >> - Introduce igt_dodge_on(cond, label): Report a pre-emptive 'fail' > >> when the condition is true. Make sure this is over-ridable with > >> IGT_DODGE=0 so as we can easily run these tests without recompiling > them. > > > > Make this igt_skip_on_ci(cond) and require IGT_CI=1 to activate them. > > Much like with simulation. > > replace skip with fail, and we agree. Skips are too easy to ignore! > > > > > Still, a BIOS update to one of the CI machines might mean (if it's not > > now the case, not very far fetched for the future) that we go churn in > > the IGT codebase to drop bunch of these. That's not the optimal > > workflow I can think of when we're discussing a separate mailing list > > for IGT discussion and patches to make it more self-contained. Then we > > bind that new mailing list to our CI farm contents, and bind making > > fixes to the CI farm operation directly to the IGT reviewing bandwidth? > > Isn't what we are proposing doing exactly this? By changing the source code > of IGT, we allow people to send patches to remove some workarounds and > see if they pass or fail in the same way they would propose any change to > IGT. Moreover, we make the running of IGT in our farm as transparent as > possible. > > > > > I'm still thinking best way would be that CI would mask the known > > problematic ones from the failure/pass criteria, and then somebody > > actually looks at the masked on after their testing coverage is > > prioritized. I think IGT should try to provide a wide range of tests > > that are supposed to work on any certain hardware. If they don't, it's > > not a reason to change the tests itself. > > This is true that some skips will be highly-machine specific, but isn't our role > as developers to know what and what doesn't work? By pushing this to an > external whitelist only for CI, we miss an opportunity to improve this CI > blacklist. > > Now, let me remind you that this blacklist is *only* for tests that hang the > machine or leave it in an inconsistant state which will lead to a crash later. > > > > > With the filter, we can grow the testing coverage for the new > > platforms, even if CI happens to have odd machines that may not pass > > those tests (and we may not have the resources to immediately fix > > those). All this without churning on the IGT codebase. > > You are describing what cibuglog is already doing. Failing tests cases are > suppressed in pre-merge, and associated bugs[1]. > > See above for what this proposal is about. > > [1] https://intel-gfx-ci.01.org/cibuglog/ > > > > > But if this is the only technically viable solution in short-term, > > then so be it. I just see better options too. > > Maybe we need a write up our workflow. This time, a public one! > > I hope _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx