Re: [PATCH] drm/i915/selftests: Allow engine reset failure to do a GT reset in hangcheck selftest

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/22/21 20:09, John Harrison wrote:
And to be clear, the engine reset is not supposed to fail. Whether issued by GuC or i915, the GDRST register is supposed to self clear according to the bspec. If we are being sent the G2H notification for an engine reset failure then the assumption is that the hardware is broken. This is not a situation that is ever intended to occur in a production system. Therefore, it is not something we should spend huge amounts of effort on making a perfect selftest for.

I don't agree. Selftests are there to verify that assumptions made and contracts in the code hold and that hardware behaves as intended / assumed. No selftest should ideally trigger in a production driver / system. That doesn't mean we can remove all selftests or ignore updating them for altered assumptions / contracts. I think it's important here to acknowledge the fact that this and the perf selftest have found two problems that need consideration for fixing for a production system.


The current theory is that the timeout in GuC is not quite long enough for DG1. Given that the bspec does not specify any kind of timeout, it is only a best guess anyway! Once that has been tuned correctly, we should never hit this case again. Not ever, Not in a selftest, not in an end user use case, just not ever.

..until we introduce new hardware for which the tuning doesn't hold anymore or somebody in a two years wants to lower the timeout wondering why it was set so long?

/Thomas





[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux