This is a note to let you know that I've just added the patch titled drm/amdgpu: Queue KFD reset workitem in VF FED to the 6.10-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: drm-amdgpu-queue-kfd-reset-workitem-in-vf-fed.patch and it can be found in the queue-6.10 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. commit 11e53f5e26c8b031e1e31a9d841ce5c7fb149159 Author: Victor Skvortsov <victor.skvortsov@xxxxxxx> Date: Sun May 19 10:39:43 2024 -0400 drm/amdgpu: Queue KFD reset workitem in VF FED [ Upstream commit 5434bc03f52de2ec57d6ce684b1853928f508cbc ] The guest recovery sequence is buggy in Fatal Error when both FLR & KFD reset workitems are queued at the same time. In addition, FLR guest recovery sequence is out of order when PF/VF communication breaks due to a GPU fatal error As a temporary work around, perform a KFD style reset (Initiate reset request from the guest) inside the pf2vf thread on FED. Signed-off-by: Victor Skvortsov <victor.skvortsov@xxxxxxx> Reviewed-by: Zhigang Luo <zhigang.luo@xxxxxxx> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index 761fff80ec1f..923d51f16ec8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -602,7 +602,7 @@ static void amdgpu_virt_update_vf2pf_work_item(struct work_struct *work) amdgpu_sriov_runtime(adev) && !amdgpu_in_reset(adev)) { amdgpu_ras_set_fed(adev, true); if (amdgpu_reset_domain_schedule(adev->reset_domain, - &adev->virt.flr_work)) + &adev->kfd.reset_work)) return; else dev_err(adev->dev, "Failed to queue work! at %s", __func__);