[RFC] How to assign blame when multiple rings are hung

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am working with a patchset [1] which, originally, aimed to fix
how we find out the guilty batches with ppgtt.

But during the review it became clear that I don't have a clear
idea how the behaviour should be when multiple rings encounter
a problematic batch at the same time.

The following i-g-t patch will add test which asserts that
both contexts get blame of having (problematic) batch active
during hang.

The patch set [1] will fail with this test case as it will
blame only the first context that injected the hang.
We would need to change the test to for it to pass:
-       assert_reset_status(fd[1], 0, RS_BATCH_ACTIVE);
+       assert_reset_status(fd[1], 0, RS_BATCH_PENDING);

I lean towards that both contexts get their batch_active count
increased. As other rings might gain contexts and we could
already reset individual rings instead of whole GPU.

But we need to take a pick so thats why the RFC.
Thoughts?

--
[1]: https://github.com/mkuoppal/linux/commits/one_guilty

Mika Kuoppala (1):
  tests/gem_reset_stats: add subtest hang-render-and-<ring>

 tests/gem_reset_stats.c |   34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux