Re: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by removing dynamic callbacks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 12.03.25 um 08:59 schrieb Jesse.zhang@xxxxxxx:
> From: "Jesse.zhang@xxxxxxx" <Jesse.zhang@xxxxxxx>
>
> Since KFD no longer registers its own callbacks for SDMA resets, and only KGD uses the reset mechanism,
> we can simplify the SDMA reset flow by directly calling the ring's `stop_queue` and `start_queue` functions.
> This patch removes the dynamic callback mechanism and prepares for its eventual deprecation.
>
> The current SDMA reset mechanism uses a dynamic callback registration system to allow both KFD and KGD to register pre-reset and post-reset functions.
> However, KFD no longer registers its callbacks, and the reset logic is now entirely handled by KGD.
> This makes the dynamic callback mechanism unnecessary and adds complexity to the code.
>
> 1. **Remove Dynamic Callbacks**:
>    - The `pre_reset` and `post_reset` callback invocations in `amdgpu_sdma_reset_engine` are removed.
>    - Instead, the ring's `stop_queue` and `start_queue` functions are called directly during the reset process.
>
> 2. **Add `stop_queue` and `start_queue` to Ring Functions**:
>    - The `amdgpu_ring_funcs` structure is extended to include `stop_queue` and `start_queue` functions.
>    - These functions are implemented in the SDMA version-specific code (e.g., `sdma_v4_4_2_stop_queue` and `sdma_v4_4_2_restore_queue`).
>
> 3. **Prepare for Deprecation of Dynamic Mechanism**:
>    - By removing the callback invocations, this patch prepares the codebase for the eventual removal of the dynamic callback registration mechanism.
>
> Signed-off-by: Jesse Zhang <Jesse.Zhang@xxxxxxx>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 33 ++----------------------
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c |  2 ++
>  3 files changed, 6 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index b4fd1e17205e..1c52ff92ea26 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -237,6 +237,8 @@ struct amdgpu_ring_funcs {
>  	void (*patch_ce)(struct amdgpu_ring *ring, unsigned offset);
>  	void (*patch_de)(struct amdgpu_ring *ring, unsigned offset);
>  	int (*reset)(struct amdgpu_ring *ring, unsigned int vmid);
> +	int (*stop_queue)(struct amdgpu_device *adev, uint32_t instance_id);
> +	int (*start_queue)(struct amdgpu_device *adev, uint32_t instance_id);
>  	void (*emit_cleaner_shader)(struct amdgpu_ring *ring);
>  	bool (*is_guilty)(struct amdgpu_ring *ring);
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> index 39669f8788a7..b1f4a11ad0c9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> @@ -534,16 +534,10 @@ void amdgpu_sdma_register_on_reset_callbacks(struct amdgpu_device *adev, struct
>   * @instance_id: ID of the SDMA engine instance to reset
>   * @suspend_user_queues: check if suspend user queue.
>   *
> - * This function performs the following steps:
> - * 1. Calls all registered pre_reset callbacks to allow KFD and AMDGPU to save their state.
> - * 2. Resets the specified SDMA engine instance.
> - * 3. Calls all registered post_reset callbacks to allow KFD and AMDGPU to restore their state.
> - *
>   * Returns: 0 on success, or a negative error code on failure.
>   */
>  int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, uint32_t instance_id, bool suspend_user_queues)
>  {
> -	struct sdma_on_reset_funcs *funcs;
>  	int ret = 0;
>  	struct amdgpu_sdma_instance *sdma_instance = &adev->sdma.instance[instance_id];;
>  	struct amdgpu_ring *gfx_ring = &sdma_instance->ring;
> @@ -571,19 +565,7 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, uint32_t instance_id, b
>  		page_sched_stopped = true;
>  	}
>  
> -	/* Invoke all registered pre_reset callbacks */
> -	list_for_each_entry(funcs, &adev->sdma.reset_callback_list, list) {
> -		if (funcs->pre_reset) {
> -			ret = funcs->pre_reset(adev, instance_id);
> -			if (ret) {
> -				dev_err(adev->dev,
> -				"beforeReset callback failed for instance %u: %d\n",
> -					instance_id, ret);
> -				goto exit;
> -			}
> -		}
> -	}
> -
> +	gfx_ring->funcs->stop_queue(adev, instance_id);

Yeah that starts to look good. Question here is who is calling amdgpu_sdma_reset_engine()?

If this call comes from engine specific code we might not need the start/stop_queue callbacks all together.

Regards,
Christian.

>  	/* Perform the SDMA reset for the specified instance */
>  	ret = amdgpu_dpm_reset_sdma(adev, 1 << instance_id);
>  	if (ret) {
> @@ -591,18 +573,7 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, uint32_t instance_id, b
>  		goto exit;
>  	}
>  
> -	/* Invoke all registered post_reset callbacks */
> -	list_for_each_entry(funcs, &adev->sdma.reset_callback_list, list) {
> -		if (funcs->post_reset) {
> -			ret = funcs->post_reset(adev, instance_id);
> -			if (ret) {
> -				dev_err(adev->dev,
> -				"afterReset callback failed for instance %u: %d\n",
> -					instance_id, ret);
> -				goto exit;
> -			}
> -		}
> -	}
> +	gfx_ring->funcs->start_queue(adev, instance_id);
>  
>  exit:
>  	/* Restart the scheduler's work queue for the GFX and page rings
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> index fd34dc138081..c1f7ccff9c4e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
> @@ -2132,6 +2132,8 @@ static const struct amdgpu_ring_funcs sdma_v4_4_2_ring_funcs = {
>  	.emit_reg_wait = sdma_v4_4_2_ring_emit_reg_wait,
>  	.emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper,
>  	.reset = sdma_v4_4_2_reset_queue,
> +	.stop_queue = sdma_v4_4_2_stop_queue,
> +	.start_queue = sdma_v4_4_2_restore_queue,
>  	.is_guilty = sdma_v4_4_2_ring_is_guilty,
>  };
>  




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux