Re: [PATCH 2/2] drm/amdgpu: Mark ctx as guilty in ring_soft_recovery path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15.01.24 18:54, Michel Dänzer wrote:
On 2024-01-15 18:26, Friedrich Vock wrote:
[snip]
The fundamental problem here is that not telling applications that
something went wrong when you just canceled their work midway is an
out-of-spec hack.
When there is a report of real-world apps breaking because of that hack,
reports of different apps working (even if it's convenient that they
work) doesn't justify keeping the broken code.
If the breaking apps hit multiple soft resets in a row, I've laid out a pragmatic solution which covers both cases.
Hitting soft reset every time is the lucky path. Once GPU work is interrupted out of nowhere, all bets are off and it might as well trigger a full system hang next time. No hang recovery should be able to cause that under any circumstance.

If mutter needs to be robust against faults it caused itself, it should be robust
against GPU resets.
It's unlikely that the hangs I've seen were caused by mutter itself, more likely Mesa or amdgpu.

Anyway, this will happen at some point, the reality is it hasn't yet though.



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux