On 9/26/2024 2:31 PM, Xu, Feifei wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > >>> + refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) && >>> + (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS); > > Is there a corner case that reloading with a different version tos and refreshing nps change co-exist? > I guess you are referring to the below corner case 1) Place NPS request 2) Unload Driver 3) Reinstall driver with a different TOS (possible but quite unlikely) 4) Driver reload 5) Driver checks TOS version first and goes for a reset 6) reset_flag of GMC is not set, hence it doesn't refresh the NPS range. I think changing the order in soc15_need_reset_on_init() to check for NPS request before TOS version check will solve this. Thanks, Lijo > Thanks, > Feifei > > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Lijo Lazar > Sent: Tuesday, September 24, 2024 1:57 PM > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@xxxxxxx>; Errabolu, Ramesh <Ramesh.Errabolu@xxxxxxx> > Subject: [PATCH 6/7] drm/amdgpu: Check gmc requirement for reset on init > > Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table. > > Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 13 ++++++++++++- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 5 +++++ > drivers/gpu/drm/amd/amdgpu/soc15.c | 2 ++ > 3 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > index 21f1e65c9dc9..011fe3a847d0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > @@ -1261,12 +1261,15 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev, > struct amdgpu_gmc_memrange *ranges; > int range_cnt, ret, i, j; > uint32_t nps_type; > + bool refresh; > > if (!mem_ranges) > return -EINVAL; > > + refresh = (adev->init_lvl->level != AMDGPU_INIT_LEVEL_MINIMAL_XGMI) && > + (adev->gmc.reset_flags & AMDGPU_GMC_INIT_RESET_NPS); > ret = amdgpu_discovery_get_nps_info(adev, &nps_type, &ranges, > - &range_cnt, false); > + &range_cnt, refresh); > > if (ret) > return ret; > @@ -1392,3 +1395,11 @@ void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev) > adev->dev, > "NPS mode change request done, reload driver to complete the change\n"); } > + > +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev) { > + if (adev->gmc.gmc_funcs->need_reset_on_init) > + return adev->gmc.gmc_funcs->need_reset_on_init(adev); > + > + return false; > +} > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > index b13d6adb5efd..d4cd247fe574 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h > @@ -78,6 +78,8 @@ enum amdgpu_memory_partition { > BIT(AMDGPU_NPS3_PARTITION_MODE) | BIT(AMDGPU_NPS4_PARTITION_MODE) | \ > BIT(AMDGPU_NPS6_PARTITION_MODE) | BIT(AMDGPU_NPS8_PARTITION_MODE)) > > +#define AMDGPU_GMC_INIT_RESET_NPS BIT(0) > + > /* > * GMC page fault information > */ > @@ -169,6 +171,7 @@ struct amdgpu_gmc_funcs { > /* Request NPS mode */ > int (*request_mem_partition_mode)(struct amdgpu_device *adev, > int nps_mode); > + bool (*need_reset_on_init)(struct amdgpu_device *adev); > }; > > struct amdgpu_xgmi_ras { > @@ -314,6 +317,7 @@ struct amdgpu_gmc { > const struct amdgpu_gmc_funcs *gmc_funcs; > enum amdgpu_memory_partition requested_nps_mode; > uint32_t supported_nps_modes; > + uint32_t reset_flags; > > struct amdgpu_xgmi xgmi; > struct amdgpu_irq_src ecc_irq; > @@ -468,5 +472,6 @@ int amdgpu_gmc_get_nps_memranges(struct amdgpu_device *adev, int amdgpu_gmc_request_memory_partition(struct amdgpu_device *adev, > int nps_mode); > void amdgpu_gmc_prepare_nps_mode_change(struct amdgpu_device *adev); > +bool amdgpu_gmc_need_reset_on_init(struct amdgpu_device *adev); > > #endif > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c > index 619933f252aa..97ca4931a7ef 100644 > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c > @@ -833,6 +833,8 @@ static bool soc15_need_reset_on_init(struct amdgpu_device *adev) > > if (amdgpu_psp_tos_reload_needed(adev)) > return true; > + if (amdgpu_gmc_need_reset_on_init(adev)) > + return true; > /* Just return false for soc15 GPUs. Reset does not seem to > * be necessary. > */ > -- > 2.25.1 >