On Fri, Aug 04, 2017 at 06:05:10PM +0100, Chris Wilson wrote: > Quoting Daniel Vetter (2017-08-04 17:07:22) > > We now have full (or a lot at least) igt running in beta CI, and snb > > blt hangs are really unhappy: > > > > - drv_hangman@error-state-capture-blt and gem_exec_capture@capture-blt > > reliably result in insta-machine death when we try to reset the gpu, > > both on the CI snb and the one I have here. > > > > - Other testcases also randomly (and sometimes rather rarely) die on > > snb. > > > > We can't use the endless batch because that results in a reset failure > > and wedged gpu, so also not really better. > > It shouldn't be the recursion, but the invalid instruction we use to try > and trigger the hang quicker (otherwise hangcheck may see the advancing > ACTHD and give us longer to escape the loop). > > In gem_exec_capture we shouldn't even need that invalid instruction, we > just need the busy batch as we pull the trigger ourselves, and if that > fails to reset a simple recursive batch we have some issues to resolve. Endless loop for haning results in a reset failure on blt as described in the commit message. We end up with a permanent and unrecoverable -EIO, which is as deadly to CI as outright killing the machine. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx