On 1/9/2025 8:27 PM, Kim, Jonathan wrote: > [Public] > >> -----Original Message----- >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> >> Sent: Thursday, January 9, 2025 1:14 AM >> To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Cc: Kasiviswanathan, Harish <Harish.Kasiviswanathan@xxxxxxx> >> Subject: Re: [PATCH] drm/amdgpu: fix gpu recovery disable with per queue reset >> >> >> >> On 1/9/2025 1:31 AM, Jonathan Kim wrote: >>> Per queue reset should be bypassed when gpu recovery is disabled >>> with module parameter. >>> >>> Signed-off-by: Jonathan Kim <jonathan.kim@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c >>> index cc66ebb7bae1..441568163e20 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c >>> @@ -1131,6 +1131,9 @@ uint64_t kgd_gfx_v9_hqd_get_pq_addr(struct >> amdgpu_device *adev, >>> uint32_t low, high; >>> uint64_t queue_addr = 0; >>> >>> + if (!amdgpu_gpu_recovery) >>> + return 0; >>> + >>> kgd_gfx_v9_acquire_queue(adev, pipe_id, queue_id, inst); >>> amdgpu_gfx_rlc_enter_safe_mode(adev, inst); >>> >>> @@ -1179,6 +1182,9 @@ uint64_t kgd_gfx_v9_hqd_reset(struct amdgpu_device >> *adev, >>> uint32_t low, high, pipe_reset_data = 0; >>> uint64_t queue_addr = 0; >>> >>> + if (!amdgpu_gpu_recovery) >>> + return 0; >>> + >> >> I think the right place for this check is not inside callback, should be >> from the place where the callback gets called. > > I don't think it really matters. We're going to have different reset types in the future that my come from different callers. > It's probably easier to remember to put the bypass where the reset is actually happening. > If I want to define something like amdgpu_gpu_recovery=2 (don't do queue reset but perform other resets), then it matters. Since this is a callback, keeping it at the wrapper place may be more maintainable rather than keeping the check for gfx10/11/12 etc. Thanks, Lijo > Jon > >> >> Thanks, >> Lijo >> >>> kgd_gfx_v9_acquire_queue(adev, pipe_id, queue_id, inst); >>> amdgpu_gfx_rlc_enter_safe_mode(adev, inst); >>> >