> -----Original Message----- > From: Lofstedt, Marta > Sent: Thursday, October 26, 2017 1:57 PM > To: 'Martin Peres' <martin.peres@xxxxxxxxxxxxxxx>; Joonas Lahtinen > <joonas.lahtinen@xxxxxxxxxxxxxxx>; Daniel Vetter <daniel.vetter@xxxxxxxx>; > Intel Graphics Development <intel-gfx@xxxxxxxxxxxxxxxxxxxxx>; Latvala, Petri > <petri.latvala@xxxxxxxxx> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Subject: RE: [PATCH i-g-t] tests/gem_eio: Skip in-flight-suspend on snb > > Since the discussion on this died and I believe that everyone are scared that > the dodge suggestion would require someone to do some work. > I could Ack the patch if: > > + igt_skip_on_f(IS_SANDYBRIDGE(intel_get_drm_devid(fd)), > + "random incompletes in CI with this test\n"); > + > > Was replaced with an igt_warn > Forgot to write, it should be igt_warn paired with success on the test. This will produce the WARN result, note this is not the same as dmesg-warn. This is quite rare and it will be noticed. > /Marta > > > -----Original Message----- > > From: Martin Peres [mailto:martin.peres@xxxxxxxxxxxxxxx] > > Sent: Monday, October 23, 2017 12:29 PM > > To: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Daniel Vetter: > > <daniel.vetter@xxxxxxxx>; Intel Graphics Development <intel- > > gfx@xxxxxxxxxxxxxxxxxxxxx>; Latvala, Petri <petri.latvala@xxxxxxxxx> > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>; Lofstedt, Marta > > <marta.lofstedt@xxxxxxxxx> > > Subject: Re: [PATCH i-g-t] tests/gem_eio: Skip in-flight-suspend on > > snb > > > > On 20/10/17 12:26, Joonas Lahtinen wrote: > > > + Petri > > > > > > On Thu, 2017-10-19 at 16:29 +0300, Martin Peres wrote: > > >> On 19/10/17 12:51, Daniel Vetter wrote: > > >>> CI gets upset about it resulting in an incomplete, let's skip it > > >>> until that's fixed to avoid havoc in the CI farm. Of course this > > >>> should/will be reverted as soon as we have a fix (similar to how > > >>> we dealt with the snb-dies-in-blt-hangs issue). > > >>> > > >>> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > >>> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > >>> Cc: "Lofstedt, Marta" <marta.lofstedt@xxxxxxxxx> > > >>> Cc: Martin Peres <martin.peres@xxxxxxxxxxxxxxx> > > >>> References: > > >>> https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_eio@in-flight-sus > > >>> pe > > >>> nd.html > > >>> References: https://bugs.freedesktop.org/show_bug.cgi?id=103289 > > >>> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > > > > > > <SNIP> > > > > > >> So, let's recap the problem here: > > >> - Any incomplete in sharded runs mean that the platform is unfit > > >> for pre-merge (because any other test after will go from pass to notrun) > > >> - We can't fix issues immediately, especially for old platforms > > >> > > >> This patch is sweeping the test under the rug by using the skip > > >> output, which is not only hard to track, it is also misleading. > > >> > > >> After discussing with Marta, Arek and Petri, we found some > > >> consensus on the following proposal (terminology is up for debate): > > >> > > >> - Introduce igt_dodge_on(cond, label): Report a pre-emptive 'fail' > > >> when the condition is true. Make sure this is over-ridable with > > >> IGT_DODGE=0 so as we can easily run these tests without recompiling > > them. > > > > > > Make this igt_skip_on_ci(cond) and require IGT_CI=1 to activate them. > > > Much like with simulation. > > > > replace skip with fail, and we agree. Skips are too easy to ignore! > > > > > > > > Still, a BIOS update to one of the CI machines might mean (if it's > > > not now the case, not very far fetched for the future) that we go > > > churn in the IGT codebase to drop bunch of these. That's not the > > > optimal workflow I can think of when we're discussing a separate > > > mailing list for IGT discussion and patches to make it more > > > self-contained. Then we bind that new mailing list to our CI farm > > > contents, and bind making fixes to the CI farm operation directly to the > IGT reviewing bandwidth? > > > > Isn't what we are proposing doing exactly this? By changing the source > > code of IGT, we allow people to send patches to remove some > > workarounds and see if they pass or fail in the same way they would > > propose any change to IGT. Moreover, we make the running of IGT in our > > farm as transparent as possible. > > > > > > > > I'm still thinking best way would be that CI would mask the known > > > problematic ones from the failure/pass criteria, and then somebody > > > actually looks at the masked on after their testing coverage is > > > prioritized. I think IGT should try to provide a wide range of tests > > > that are supposed to work on any certain hardware. If they don't, > > > it's not a reason to change the tests itself. > > > > This is true that some skips will be highly-machine specific, but > > isn't our role as developers to know what and what doesn't work? By > > pushing this to an external whitelist only for CI, we miss an > > opportunity to improve this CI blacklist. > > > > Now, let me remind you that this blacklist is *only* for tests that > > hang the machine or leave it in an inconsistant state which will lead to a > crash later. > > > > > > > > With the filter, we can grow the testing coverage for the new > > > platforms, even if CI happens to have odd machines that may not pass > > > those tests (and we may not have the resources to immediately fix > > > those). All this without churning on the IGT codebase. > > > > You are describing what cibuglog is already doing. Failing tests cases > > are suppressed in pre-merge, and associated bugs[1]. > > > > See above for what this proposal is about. > > > > [1] https://intel-gfx-ci.01.org/cibuglog/ > > > > > > > > But if this is the only technically viable solution in short-term, > > > then so be it. I just see better options too. > > > > Maybe we need a write up our workflow. This time, a public one! > > > > I hope _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx