Re: [PATCH v2 03/10] drm/amdgpu: abort fence poll if reset is started

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 29.05.24 um 15:44 schrieb Li, Yunxiang (Teddy):
[AMD Official Use Only - AMD Internal Distribution Only]

I don't think trying to add some reset handling here makes sense in the first place.
Part of the reset/recovery procedure is to signal all fence and that includes the one we are waiting for here.
So this wait should return immediately in a reset anyway.
As far as I can tell, these fence_ptr s that get polled are not packaged into a fence obj, and in practice I see 10s of seconds wait before these timeout and reset can begin. Also after reset there is often a long wait, up to 2 minutes, for all the tlb_fence_work to timeout (not addressed by this patch, still haven't figure out what's going on there)

The problem is that we don't force complete the non scheduler rings, e.g. MES, KIQ etc...

Try to remove this check here from the loop in amdgpu_device_pre_asic_reset():

                if (!amdgpu_ring_sched_ready(ring))
                        continue;

Regards,
Christian.



Teddy




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux