Re: [PATCH 2/2] drm/amd/amdgpu: use the default reset for ras recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 29, 2024 at 4:07 AM Kenneth Feng <kenneth.feng@xxxxxxx> wrote:
>
> use the default reset for ras recovery
>
> Signed-off-by: Kenneth Feng <kenneth.feng@xxxxxxx>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index a037e8fba29f..f92b2c4f0d5c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2437,6 +2437,7 @@ static void amdgpu_ras_do_recovery(struct work_struct *work)
>         struct amdgpu_device *adev = ras->adev;
>         struct list_head device_list, *device_list_handle =  NULL;
>         struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev);
> +       int save_reset_method = amdgpu_reset_method;
>
>         if (hive) {
>                 atomic_set(&hive->ras_recovery, 1);
> @@ -2501,7 +2502,13 @@ static void amdgpu_ras_do_recovery(struct work_struct *work)
>                         }
>                 }
>
> +               if (amdgpu_gpu_recovery == 2)
> +                       amdgpu_reset_method = -1;
> +
>                 amdgpu_device_gpu_recover(ras->adev, NULL, &reset_context);
> +
> +               if (amdgpu_gpu_recovery == 2)
> +                       amdgpu_reset_method = save_reset_method;

This is racy.  amdgpu_gpu_recovery is a global variable and will be
referenced by all of the AMD GPUs in the system that are using amdgpu.
To handle this properly, we should store the selected reset method in
the adev structure and set that based on the module parameter at
driver bind time.  Then at runtime if we need to change the reset
method, we can change the device specific one in adev.  Maybe it would
be better to have two variable in adev.  E.g., default_reset_method
and override_reset_method.  In cases where have to use the default
method, we can use that.  In other cases, we can use the override
method.

Alex


>         }
>         atomic_set(&ras->in_recovery, 0);
>         if (hive) {
> --
> 2.34.1
>




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux