Quoting Tvrtko Ursulin (2018-02-19 10:58:25) > > On 19/02/2018 10:26, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-02-19 09:57:20) > >> > >> On 19/02/2018 09:27, Chris Wilson wrote: > >>> Quoting Tvrtko Ursulin (2018-02-19 09:19:47) > >>>> > >>>> Do you have a link to BSW hang? Is that obviously related to PMU? > >>> > >>> It's only occurring in this test, just looks like an issue with the > >>> spinner: > >>> > >>> [bsw] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_pmu@xxxxxxxxxxxxxxxxxxxxxxxxx > >> > >> ... > >> <0>[ 681.022677] perf_pmu-1516 1..s1 282520414us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > >> <0>[ 681.022838] perf_pmu-1516 1..s1 282520580us : execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?] > >> <0>[ 681.023001] perf_pmu-1516 1..s1 282520594us : execlists_submission_tasklet: bcs0 csb[0]: status=0x00000001:0x00000000, active=0x1 > >> <0>[ 681.023168] kworker/-338 1.... 298087910us : reset_common_ring: bcs0 seqno=a > >> <0>[ 681.023321] ksoftirq-17 1..s. 298088483us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > >> <0>[ 681.023482] ksoftirq-17 1..s. 298088575us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] > >> <0>[ 681.023644] ksoftirq-17 1..s. 298088579us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1 > >> <0>[ 681.023811] ksoftirq-17 1..s. 298088581us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a > >> > >> Everything stops. > >> > >>> [kbl] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_pmu@xxxxxxxxxxxxxxxxxxxxxxxxx > >> > >> ... > >> <0>[ 506.745332] perf_pmu-1544 3..s1 107905835us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > >> <0>[ 506.745397] <idle>-0 2..s1 107905980us : execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?] > >> <0>[ 506.745440] <idle>-0 2..s1 107905983us : execlists_submission_tasklet: bcs0 csb[3]: status=0x00000001:0x00000000, active=0x1 > >> <0>[ 506.745498] kworker/-30 3.... 120840583us : reset_common_ring: bcs0 seqno=a > >> <0>[ 506.745547] ksoftirq-29 3..s. 120840688us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > >> <0>[ 506.745598] in:imklo-499 2..s1 120840710us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] > >> <0>[ 506.745637] in:imklo-499 2..s1 120840712us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1 > >> <0>[ 506.745676] in:imklo-499 2..s1 120840713us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a > >> > >> Everything stops here. > >> > >> I have not idea what's happening here. In both cases I would expect the test > >> to have exited after the GPU hang (or at least attempt to exit!), since it > >> would detect it overran the timeout. > >> > >> Could it be stuck in gem_sync after the reset? Or somewhere else? > > > > I think it's that we will be throwing the calibration off if it hangs. > > If busy_ns = 10s, won't that generate a target idle time of 500s? > > Indeed, well spotted. I'll need to add a hang detector of some sort. Oh, I think I know why it's hanging. As the buffer will be idle, the kernel is allowed to move it, and __submit_spin_batch() doesn't tell the kernel to preserve the original address (so the kernel assumes that the relocations are relative to the passed in address and so move the buffer to match). I should have noticed that before given the discussion around EXEC_OBJECT_PINNED for the spinner. I think there's an easy enough patch... -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx