On 9/27/2024 9:26 AM, Xu, Feifei wrote: >>>I guess you are referring to the below corner case > >>> 1) Place NPS request > >>> 2) Unload Driver > >>> 3) Reinstall driver with a different TOS (possible but quite > unlikely) > >>> 4) Driver reload > >>> 5) Driver checks TOS version first and goes for a reset > >>> 6) reset_flag of GMC is not set, hence it doesn't refresh > the NPS range. > > > > > >>>I think changing the order in soc15_need_reset_on_init() to check for > NPS request before TOS version check will solve this. > > Yes, I was thinking of reset_flag and tOS reloading > (adev->init_lvl->level set to AMDGPU_INIT_LEVEL_MINIMAL_XGMI) changing > at the same time. And NPS refresh will be ignored. Though might be > likely in debugging or regression isolation cases which changing driver > packaged with different TOS. And yes making NPS refresh checking before > TOS version checking will solve this. > > And if we do not return ahead when checking NPS request before tOS > version change in soc15_need_reset_on_init(), we can drop > (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) check in below > refresh checking: > This check is used when NPS request change is identified. During swinit part it will be at MINIMAL_XGMI level, but at that point there is no need to refresh this information as reset is pending. It is refreshed after a reset when init level returns to default. Thanks, Lijo > + refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) && > + (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS); > > refresh = (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS); > > Thanks > Feifei > > On 9/26/2024 5:01 PM, Xu, Feifei wrote: >>>> + refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) && >>>> + (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS); >> Is there a corner case that reloading with a different version tos and refreshing nps change co-exist? >> >> Thanks, >> Feifei >> >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Lijo Lazar >> Sent: Tuesday, September 24, 2024 1:57 PM >> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@xxxxxxx>; Errabolu, Ramesh <Ramesh.Errabolu@xxxxxxx> >> Subject: [PATCH 6/7] drm/amdgpu: Check gmc requirement for reset on init >> >> Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table. >> >> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 13 ++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 5 +++++ >> drivers/gpu/drm/amd/amdgpu/soc15.c | 2 ++ >> 3 files changed, 19 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >> index 21f1e65c9dc9..011fe3a847d0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >> @@ -1261,12 +1261,15 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev, >> struct amdgpu_gmc_memrange *ranges; >> int range_cnt, ret, i, j; >> uint32_t nps_type; >> + bool refresh; >> >> if (!mem_ranges) >> return -EINVAL; >> >> + refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) && >> + (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS); >> ret = amdgpu_discovery_get_nps_info(adev, &nps_type, &ranges, >> - &range_cnt, false); >> + &range_cnt, refresh); >> >> if (ret) >> return ret; >> @@ -1392,3 +1395,11 @@ void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev) >> adev->dev, >> "NPS mode change request done, reload driver to complete the change\n"); } >> + >> +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev) { >> + if (adev->gmc.gmc_funcs->need_reset_on_init) >> + return adev->gmc.gmc_funcs->need_reset_on_init(adev); >> + >> + return false; >> +} >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h >> index b13d6adb5efd..d4cd247fe574 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h >> @@ -78,6 +78,8 @@ enum amdgpu_memory_partition { >> BIT(AMDGPU_NPS3_PARTITION_MODE) | BIT(AMDGPU_NPS4_PARTITION_MODE) | \ >> BIT(AMDGPU_NPS6_PARTITION_MODE) | BIT(AMDGPU_NPS8_PARTITION_MODE)) >> >> +#define AMDGPU_GMC_INIT_RESET_NPS BIT(0) >> + >> /* >> * GMC page fault information >> */ >> @@ -169,6 +171,7 @@ struct amdgpu_gmc_funcs { >> /* Request NPS mode */ >> int (*request_mem_partition_mode)(struct amdgpu_device *adev, >> int nps_mode); >> + bool (*need_reset_on_init)(struct amdgpu_device *adev); >> }; >> >> struct amdgpu_xgmi_ras { >> @@ -314,6 +317,7 @@ struct amdgpu_gmc { >> const struct amdgpu_gmc_funcs *gmc_funcs; >> enum amdgpu_memory_partition requested_nps_mode; >> uint32_t supported_nps_modes; >> + uint32_t reset_flags; >> >> struct amdgpu_xgmi xgmi; >> struct amdgpu_irq_src ecc_irq; >> @@ -468,5 +472,6 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev, int amdgpu_gmc_request_memory_partition(struct amdgpu_device *adev, >> int nps_mode); >> void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev); >> +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev); >> >> #endif >> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c >> index 619933f252aa..97ca4931a7ef 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c >> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c >> @@ -833,6 +833,8 @@ static bool soc15_need_reset_on_init(struct amdgpu_device *adev) >> >> if (amdgpu_psp_tos_reload_needed(adev)) >> return true; >> + if (amdgpu_gmc_need_reset_on_init(adev)) >> + return true; >> /* Just return false for soc15 GPUs. Reset does not seem to >> * be necessary. >> */ >> -- >> 2.25.1 >>