Re: [PATCH 1/2] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit after writing rptr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Am 22.01.24 um 23:39 schrieb Joshua Ashton:
[SNIP]

Most work submissions in practice submit more waves than the number of
wave slots the GPU has.
As far as I understand soft recovery, the only thing it does is kill all
active waves. This frees up the CUs so more waves are launched, which
can fault again, and that leads to potentially lots of faults for a
single wave slot in the end.

Exactly that, but killing each wave takes a moment since we do that in a loop with a bit delay in there.

So the interrupt handler should at least in theory have time to catch up.

I don't think there is any delay in that loop is there?

Mhm, looks like I remember that incorrectly.


    while (!dma_fence_is_signaled(fence) &&
           ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
        ring->funcs->soft_recovery(ring, vmid);

(soft_recovery function does not have a delay/sleep/whatever either)

FWIW, two other changes we did in SteamOS to make recovery more reliable on VANGOGH was:

1) Move the timeout determination after the spinlock setting the fence error.

Well that should not really have any effect.


2) Raise the timeout from 0.1s to 1s.

Well that's not necessarily a good idea. If the SQ isn't able to respond in 100ms then I would really go into a hard reset.

Waiting one extra second is way to long here.

Regards,
Christian.


- Joshie 🐸✨



Regards,
Christian.


Regards,
Friedrich





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux