Re: [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 16, 2017 at 08:54:51AM +0000, Lofstedt, Marta wrote:
> 
> 
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@xxxxxxxxxxxxxxxxxx]
> > Sent: Tuesday, May 16, 2017 11:21 AM
> > To: Lofstedt, Marta <marta.lofstedt@xxxxxxxxx>
> > Cc: Daniel Vetter <daniel@xxxxxxxx>; Martin Peres
> > <martin.peres@xxxxxxxxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > Subject: Re:  [PATCH i-g-t] tests/initial_state: Add a test to capture
> > the state of the GPU
> > 
> > On Tue, May 16, 2017 at 07:42:51AM +0000, Lofstedt, Marta wrote:
> > > I hereby pull-out this patch.
> > > The idea of it was to know if we were already wedged at the beginning of
> > testing, that would give us information on how to interpret silly results; such
> > that test starting to get skipped and/or we got dmesg-warns/incomplete on
> > tests that usually should be skipped.
> > > Also, we are planning to soon deploy a piglit.conf solution where testing
> > will be terminated on wedged, so I agree that my test isn't really needed.
> > 
> > Not everything is broken by wedged; internally we just use that as an
> > indicator that GEM is hosed. KMS should still work, we must still be able to
> > drive the displays to show the error and keep the servers alive until the data
> > is saved (and hopefully gracefully degrade that we don't have to interrupt
> > their immediate session).
> 
> It doesn't matter if it is broken or not, if we are terminally wedged the rest of the result may be silly. Look for example at CI_DRM_2612, the fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all test are skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, but it is a gem test so it should have been skipped, right. My conclusion from seeing this pattern multiple times is that after terminally wedged, silly things can happen, i.e. we can't trust the results, and since we don't want silly bugs, the CI testing should be stopped.

The machine didn't hang, it was remotely killed because the run timed out.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux