Re: [PATCH 6/7] drm/amdgpu: Check gmc requirement for reset on init

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 9/26/2024 2:31 PM, Xu, Feifei wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
>>> +     refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) &&
>>> +               (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);
> 
> Is there a corner case that reloading with a different version tos and refreshing nps change co-exist?
> 

I guess you are referring to the below corner case
	1) Place NPS request
	2) Unload Driver
	3) Reinstall driver with a different TOS (possible but quite unlikely)
	4) Driver reload
	5) Driver checks TOS version first and goes for a reset
	6) reset_flag of GMC is not set, hence it doesn't refresh the NPS range.


I think changing the order in soc15_need_reset_on_init() to check for
NPS request before TOS version check will solve this.

Thanks,
Lijo

> Thanks,
> Feifei
> 
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Lijo Lazar
> Sent: Tuesday, September 24, 2024 1:57 PM
> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@xxxxxxx>; Errabolu, Ramesh <Ramesh.Errabolu@xxxxxxx>
> Subject: [PATCH 6/7] drm/amdgpu: Check gmc requirement for reset on init
> 
> Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table.
> 
> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 13 ++++++++++++-  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  5 +++++
>  drivers/gpu/drm/amd/amdgpu/soc15.c      |  2 ++
>  3 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 21f1e65c9dc9..011fe3a847d0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -1261,12 +1261,15 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev,
>         struct amdgpu_gmc_memrange *ranges;
>         int range_cnt, ret, i, j;
>         uint32_t nps_type;
> +       bool refresh;
> 
>         if (!mem_ranges)
>                 return -EINVAL;
> 
> +       refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) &&
> +                 (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS);
>         ret = amdgpu_discovery_get_nps_info(adev, &nps_type, &ranges,
> -                                           &range_cnt, false);
> +                                           &range_cnt, refresh);
> 
>         if (ret)
>                 return ret;
> @@ -1392,3 +1395,11 @@ void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev)
>                         adev->dev,
>                         "NPS mode change request done, reload driver to complete the change\n");  }
> +
> +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev) {
> +       if (adev->gmc.gmc_funcs->need_reset_on_init)
> +               return adev->gmc.gmc_funcs->need_reset_on_init(adev);
> +
> +       return false;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> index b13d6adb5efd..d4cd247fe574 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> @@ -78,6 +78,8 @@ enum amdgpu_memory_partition {
>          BIT(AMDGPU_NPS3_PARTITION_MODE) | BIT(AMDGPU_NPS4_PARTITION_MODE) | \
>          BIT(AMDGPU_NPS6_PARTITION_MODE) | BIT(AMDGPU_NPS8_PARTITION_MODE))
> 
> +#define AMDGPU_GMC_INIT_RESET_NPS  BIT(0)
> +
>  /*
>   * GMC page fault information
>   */
> @@ -169,6 +171,7 @@ struct amdgpu_gmc_funcs {
>         /* Request NPS mode */
>         int (*request_mem_partition_mode)(struct amdgpu_device *adev,
>                                           int nps_mode);
> +       bool (*need_reset_on_init)(struct amdgpu_device *adev);
>  };
> 
>  struct amdgpu_xgmi_ras {
> @@ -314,6 +317,7 @@ struct amdgpu_gmc {
>         const struct amdgpu_gmc_funcs   *gmc_funcs;
>         enum amdgpu_memory_partition    requested_nps_mode;
>         uint32_t supported_nps_modes;
> +       uint32_t reset_flags;
> 
>         struct amdgpu_xgmi xgmi;
>         struct amdgpu_irq_src   ecc_irq;
> @@ -468,5 +472,6 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev,  int amdgpu_gmc_request_memory_partition(struct amdgpu_device *adev,
>                                         int nps_mode);
>  void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev);
> +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev);
> 
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 619933f252aa..97ca4931a7ef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -833,6 +833,8 @@ static bool soc15_need_reset_on_init(struct amdgpu_device *adev)
> 
>         if (amdgpu_psp_tos_reload_needed(adev))
>                 return true;
> +       if (amdgpu_gmc_need_reset_on_init(adev))
> +               return true;
>         /* Just return false for soc15 GPUs.  Reset does not seem to
>          * be necessary.
>          */
> --
> 2.25.1
> 



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux