Re: [igt-dev] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19/02/2018 09:27, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-19 09:19:47)
>>
>> Do you have a link to BSW hang? Is that obviously related to PMU?
> 
> It's only occurring in this test, just looks like an issue with the
> spinner:
> 
> [bsw] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_pmu@xxxxxxxxxxxxxxxxxxxxxxxxx

...
<0>[  681.022677] perf_pmu-1516    1..s1 282520414us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  681.022838] perf_pmu-1516    1..s1 282520580us : execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?]
<0>[  681.023001] perf_pmu-1516    1..s1 282520594us : execlists_submission_tasklet: bcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
<0>[  681.023168] kworker/-338     1.... 298087910us : reset_common_ring: bcs0 seqno=a
<0>[  681.023321] ksoftirq-17      1..s. 298088483us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  681.023482] ksoftirq-17      1..s. 298088575us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
<0>[  681.023644] ksoftirq-17      1..s. 298088579us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
<0>[  681.023811] ksoftirq-17      1..s. 298088581us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a

Everything stops.

> [kbl] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_pmu@xxxxxxxxxxxxxxxxxxxxxxxxx

...
<0>[  506.745332] perf_pmu-1544    3..s1 107905835us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  506.745397]   <idle>-0       2..s1 107905980us : execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?]
<0>[  506.745440]   <idle>-0       2..s1 107905983us : execlists_submission_tasklet: bcs0 csb[3]: status=0x00000001:0x00000000, active=0x1
<0>[  506.745498] kworker/-30      3.... 120840583us : reset_common_ring: bcs0 seqno=a
<0>[  506.745547] ksoftirq-29      3..s. 120840688us : execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  506.745598] in:imklo-499     2..s1 120840710us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
<0>[  506.745637] in:imklo-499     2..s1 120840712us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
<0>[  506.745676] in:imklo-499     2..s1 120840713us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a

Everything stops here.

I have not idea what's happening here. In both cases I would expect the test
to have exited after the GPU hang (or at least attempt to exit!), since it
would detect it overran the timeout.

Could it be stuck in gem_sync after the reset? Or somewhere else?

Could we add "echo t > /proc/sysrq-trigger" equivalent when owatch triggers?

Or it would overflow some buffer? Should work in cases like this one, when
it is not a machine hang.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux