Quoting Daniel Vetter (2017-08-08 10:01:59) > On Mon, Aug 7, 2017 at 6:34 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > Quoting Daniel Vetter (2017-08-07 17:26:56) > >> On Fri, Aug 04, 2017 at 06:05:10PM +0100, Chris Wilson wrote: > >> > Quoting Daniel Vetter (2017-08-04 17:07:22) > >> > > We now have full (or a lot at least) igt running in beta CI, and snb > >> > > blt hangs are really unhappy: > >> > > > >> > > - drv_hangman@error-state-capture-blt and gem_exec_capture@capture-blt > >> > > reliably result in insta-machine death when we try to reset the gpu, > >> > > both on the CI snb and the one I have here. > >> > > > >> > > - Other testcases also randomly (and sometimes rather rarely) die on > >> > > snb. > >> > > > >> > > We can't use the endless batch because that results in a reset failure > >> > > and wedged gpu, so also not really better. > >> > > >> > It shouldn't be the recursion, but the invalid instruction we use to try > >> > and trigger the hang quicker (otherwise hangcheck may see the advancing > >> > ACTHD and give us longer to escape the loop). > >> > > >> > In gem_exec_capture we shouldn't even need that invalid instruction, we > >> > just need the busy batch as we pull the trigger ourselves, and if that > >> > fails to reset a simple recursive batch we have some issues to resolve. > >> > >> Endless loop for haning results in a reset failure on blt as described in > >> the commit message. We end up with a permanent and unrecoverable -EIO, > >> which is as deadly to CI as outright killing the machine. > > > > No, it doesn't. snb-gt1 exhibiting the machine death on invalid blt > > instruction as reported, after fixes: > > Well my gt2 disagreed, but I guess we can push your patches to igt and > then ask CI whether we need more. Fine, dug out the snb-gt2, [ickle@huronriver tests]$ sudo ./drv_hangman IGT-Version: 1.19-gcfd42d1 (i686) (Linux: 4.12.0+ i686) Subtest error-state-sysfs-entry: SUCCESS (0.000s) Subtest error-state-basic: SUCCESS (0.004s) Subtest error-state-capture-render: SUCCESS (13.711s) Subtest error-state-capture-bsd: SUCCESS (8.006s) Test requirement not met in function test_error_state_capture, file drv_hangman.c:187: Test requirement: gem_has_ring(device, ring_id) Subtest error-state-capture-bsd1: SKIP (0.000s) Test requirement not met in function test_error_state_capture, file drv_hangman.c:187: Test requirement: gem_has_ring(device, ring_id) Subtest error-state-capture-bsd2: SKIP (0.000s) Subtest error-state-capture-blt: SUCCESS (6.049s) Test requirement not met in function test_error_state_capture, file drv_hangman.c:187: Test requirement: gem_has_ring(device, ring_id) Subtest error-state-capture-vebox: SKIP (0.000s) Test requirement not met in function hangcheck_unterminated, file drv_hangman.c:213: Test requirement: gem_uses_full_ppgtt(device) Subtest hangcheck-unterminated: SKIP (0.000s) [ickle@huronriver tests]$ sudo ./gem_exec_capture IGT-Version: 1.19-gcfd42d1 (i686) (Linux: 4.12.0+ i686) Subtest capture-render: SUCCESS (0.009s) Test requirement not met in function __real_main175, file gem_exec_capture.c:202: Test requirement: gem_can_store_dword(fd, e->exec_id | e->flags) Subtest capture-bsd: SKIP (0.000s) Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:1642: Test requirement: gem_has_ring(fd, ring) Subtest capture-bsd1: SKIP (0.000s) Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:1642: Test requirement: gem_has_ring(fd, ring) Subtest capture-bsd2: SKIP (0.000s) Subtest capture-blt: SUCCESS (0.005s) Test requirement not met in function gem_require_ring, file ioctl_wrappers.c:1642: Test requirement: gem_has_ring(fd, ring) Subtest capture-vebox: SKIP (0.000s) Seems solid to me. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx