[AMD Official Use Only - AMD Internal Distribution Only] Nack to the revert. The FLR sequence is defined as the following (host-initiated reset): 1) host sends FLR_NOTIFICATION 2) Guest gets interrupt and queues FLR work item 3) Guest sends READY_TO_RESET 4) Host sends FLR_NOTIFICATION_COMPLETION 5) Guest starts recovery In RAS FED, guest interrupts are disabled and therefore it won't receive #1. Consequently #2 & #4 will break. It doesn't make sense to re-use this sequence as-is in FED scenario. On the other hand, KFD reset work item performs the guest-initiated reset: 1) Guest waits for mailbox to work (handles the FED disable mailbox) 2) Guest sends REQ_GPU_RESET_ACCESS 3) Host acks back 4) Guest starts recovery We should keep this commit until proper guest FED reset workitem is implemented. Thanks, Victor > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > Yunxiang Li > Sent: Tuesday, May 28, 2024 1:24 PM > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian > <Christian.Koenig@xxxxxxx>; Gao, Likun <Likun.Gao@xxxxxxx>; Zhang, > Hawking <Hawking.Zhang@xxxxxxx>; Li, Yunxiang (Teddy) > <Yunxiang.Li@xxxxxxx> > Subject: [PATCH v2 10/10] Revert "drm/amdgpu: Queue KFD reset workitem in > VF FED" > > Caution: This message originated from an External Source. Use proper caution > when opening attachments, clicking links, or responding. > > > This reverts commit 2149ee697a7a3091a16447c647d4a30f7468553a. > > The issue is already fixed by > fa5a7f2ccb7e ("drm/amdgpu: Fix two reset triggered in a row") > > Signed-off-by: Yunxiang Li <Yunxiang.Li@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > index 44450507c140..4bacbf1db9e5 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c > @@ -616,7 +616,7 @@ static void > amdgpu_virt_update_vf2pf_work_item(struct work_struct *work) > amdgpu_sriov_runtime(adev)) { > amdgpu_ras_set_fed(adev, true); > if (amdgpu_reset_domain_schedule(adev->reset_domain, > - &adev->kfd.reset_work)) > + &adev->virt.flr_work)) > return; > else > dev_err(adev->dev, "Failed to queue work! at %s", __func__); > -- > 2.34.1