Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 23.04.24 um 05:13 schrieb Li, Yunxiang (Teddy):
[Public]

We can't do this technically as there are cases where we skip full device reset (even then amdgpu_in_reset will return true). The better thing to do is to move amdgpu_device_stop_pending_resets() later in
gpu_recover()- if a device has undergone full reset, then cancel all pending resets. Presently it's happening earlier which could be why this issue is seen.
This sounds like it is a design issue then, if different reset workers expect different resets to be triggered but they all use the same flag. I wonder if the other places that check this flags are correct. FWIW I was testing with SRIOV where it always does full reset and ran into this issue.

Lijo is correct. The idea here is that all reset sources which have been covered by a reset are canceled directly after the reset is completed.

The approach with checking amdgpu_in_reset() is broken because it can still happen that multiple sources signal at the same time that a reset is necessary.

Regards,
Christian.



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux