Am 29.05.24 um 16:31 schrieb Li, Yunxiang (Teddy):
[Public]
The problem is that we don't force complete the non scheduler rings, e.g. MES,
KIQ etc...
Try to remove this check here from the loop in
amdgpu_device_pre_asic_reset():
if (!amdgpu_ring_sched_ready(ring))
continue;
Ah, I see. Though I don't think this would work for the mes case, since each submission grabs their own wb address rather than using the ring.
Yeah, I know. That's one of the reason I've pointed out on the patch
adding that that this behavior is actually completely broken.
If you run into issues with the MES because of this then please suggest
a revert of that patch.
Regards,
Christian.