I agree. Removing the call to pre-reset probably breaks GPU reset for KFD.
We call the KFD suspend function in pre-reset, which uses the HIQ to
stop any user mode queues still running. If that is not possible because
the HIQ is hanging, it should fail with a timeout. There may be
something we can do if we know that the HIQ is hanging, so we only
update the KFD-internal queue state without actually sending anything to
the HIQ.
Regards,
Felix
On 2019-12-17 10:37, shaoyunl wrote:
I think amdkfd side depends on this call to stop the user queue,
without this call, the user queue can submit to HW during the reset
which could cause hang again ...
Do we know the root cause why this function would ruin MEC ? From the
logic, I think this function should be called before FLR since we need
to disable the user queue submission first.
I remembered the function should use hiq to communicate with HW ,
shouldn't use kiq to access HW registerm, has this been changed ?
Regards
shaoyun.liu
On 2019-12-17 5:19 a.m., Monk Liu wrote:
issues:
MEC is ruined by the amdkfd_pre_reset after VF FLR done
fix:
amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR,
the correct sequence is do amdkfd_pre_reset before VF FLR but there is
a limitation to block this sequence:
if we do pre_reset() before VF FLR, it would go KIQ way to do register
access and stuck there, because KIQ probably won't work by that time
(e.g. you already made GFX hang)
so the best way right now is to simply remove it.
Signed-off-by: Monk Liu <Monk.Liu@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 605cef6..ae962b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3672,8 +3672,6 @@ static int amdgpu_device_reset_sriov(struct
amdgpu_device *adev,
if (r)
return r;
- amdgpu_amdkfd_pre_reset(adev);
-
/* Resume IP prior to SMC */
r = amdgpu_device_ip_reinit_early_sriov(adev);
if (r)
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cfelix.kuehling%40amd.com%7Cbd097404ba8b4e7f9d9308d7830717fe%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637121938908876710&sdata=bNGTZtFLiQ46UwjCa5u8hXG1KUtK%2Fs98g7rBmBtTaPs%3D&reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx