Re: [v1,2/3] drm/amd/pm: send the SMT-enable message to pmfw

"Lazar, Lijo" <Lijo.Lazar@xxxxxxx> · Sat, 25 Mar 2023 01:49:06 +0000

[AMD Official Use Only - General]

The notifier block is embedded in smu context of a device. If there are 4 devices and 3 of them are interested, they register using the notifier block within their smu context. Notifier is called
 in a chain and when received they use the container_of to get the smu context and communicate with the respective device's FW on the actions to do.

BTW, I don't know why dGPU PMFW would be interested in SMT change. On AMD APU + dGPU we already have things like smartshift and it will take care if communicated to APU alone.

Thanks,

Lijo

From: Limonciello, Mario <Mario.Limonciello@xxxxxxx>

Sent: Friday, March 24, 2023 11:19:55 PM

To: Lazar, Lijo <Lijo.Lazar@xxxxxxx>; Yang, WenYou <WenYou.Yang@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>

Cc: Li, Ying <YING.LI@xxxxxxx>; Liu, Kun <Kun.Liu2@xxxxxxx>; Liang, Richard qi <Richardqi.Liang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>

Subject: RE: [v1,2/3] drm/amd/pm: send the SMT-enable message to pmfw

[AMD Official Use Only - General]

> -----Original Message-----

> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>

> Sent: Thursday, March 23, 2023 21:29

> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; Yang, WenYou

> <WenYou.Yang@xxxxxxx>; Deucher, Alexander

> <Alexander.Deucher@xxxxxxx>; Koenig, Christian

> <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>

> Cc: Li, Ying <YING.LI@xxxxxxx>; Liu, Kun <Kun.Liu2@xxxxxxx>; Liang,

> Richard qi <Richardqi.Liang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx

> Subject: Re: [v1,2/3] drm/amd/pm: send the SMT-enable message to pmfw

> 

> 

> 

> On 3/23/2023 11:36 PM, Limonciello, Mario wrote:

> > On 3/23/2023 12:41, Limonciello, Mario wrote:

> >> On 3/22/2023 00:48, Wenyou Yang wrote:

> >>> When the CPU SMT status change in the fly, sent the SMT-enable

> >>> message to pmfw to notify it that the SMT status changed.

> >>>

> >>> Signed-off-by: Wenyou Yang <WenYou.Yang@xxxxxxx>

> >>> ---

> >>>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c     | 41

> +++++++++++++++++++

> >>>   drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  5 +++

> >>>   2 files changed, 46 insertions(+)

> >>>

> >>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c

> >>> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c

> >>> index b5d64749990e..5cd85a9d149d 100644

> >>> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c

> >>> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c

> >>> @@ -22,6 +22,7 @@

> >>>   #define SWSMU_CODE_LAYER_L1

> >>> +#include <linux/cpu.h>

> >>>   #include <linux/firmware.h>

> >>>   #include <linux/pci.h>

> >>> @@ -69,6 +70,14 @@ static int smu_set_fan_speed_rpm(void *handle,

> >>> uint32_t speed);

> >>>   static int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled);

> >>>   static int smu_set_mp1_state(void *handle, enum pp_mp1_state

> >>> mp1_state);

> >>> +static int smt_notifier_callback(struct notifier_block *nb, unsigned

> >>> long action, void *data);

> >>> +

> >>> +extern struct raw_notifier_head smt_notifier_head;

> >>> +

> >>> +static struct notifier_block smt_notifier = {

> >>> +    .notifier_call = smt_notifier_callback,

> >>> +};

> >>> +

> >>>   static int smu_sys_get_pp_feature_mask(void *handle,

> >>>                          char *buf)

> >>>   {

> >>> @@ -625,6 +634,8 @@ static int smu_set_funcs(struct amdgpu_device

> *adev)

> >>>       return 0;

> >>>   }

> >>> +static struct smu_context *current_smu;

> >>> +

> >>>   static int smu_early_init(void *handle)

> >>>   {

> >>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;

> >>> @@ -645,6 +656,7 @@ static int smu_early_init(void *handle)

> >>>       mutex_init(&smu->message_lock);

> >>>       adev->powerplay.pp_handle = smu;

> >>> +    current_smu = smu;

> >

> > Although this series is intended for the Van Gogh case right now, I

> > dont't think this would scale well for multiple GPUs in a system.

> >

> > I think that instead you may want to move the notifier callback to be a

> > level "higher" in amdgpu.  Perhaps amdgpu_device.c?  Then when that

> > notifier call is received you'll want to walk through the PCI device

> > space to find any GPUs that are bound with AMDGPU a series of

> > wrappers/calls that end up calling smu_set_cpu_smt_enable with the

> > approriate arguments.

> >

> 

> This is not required when the notifier is registered only within Vangogh

> ppt function. Then follow Evan's suggestion of keeping the notifier

> block inside smu. From the notifier block, it can find the smu block and

> then call cpu_smt_enable/disable. That way notifier callback comes only

> once even with multiple dGPUs + Vangogh and processed for the

> corresponding smu.

> 

> This notifier doesn't need to be registered for platforms only with

> dGPUs or APUs which don't need this.

They don't right now, but I was thinking how this could scale to other

APUs or dGPUs if they are interested in adding support for this message

too.

> 

> Thanks,

> Lijo

> 

> >

> >>>       adev->powerplay.pp_funcs = &swsmu_pm_funcs;

> >>>       r = smu_set_funcs(adev);

> >>> @@ -1105,6 +1117,8 @@ static int smu_sw_init(void *handle)

> >>>       if (!smu->ppt_funcs->get_fan_control_mode)

> >>>           smu->adev->pm.no_fan = true;

> >>> +    raw_notifier_chain_register(&smt_notifier_head, &smt_notifier);

> >>> +

> >>>       return 0;

> >>>   }

> >>> @@ -1122,6 +1136,8 @@ static int smu_sw_fini(void *handle)

> >>>       smu_fini_microcode(smu);

> >>> +    raw_notifier_chain_unregister(&smt_notifier_head, &smt_notifier);

> >>> +

> >>>       return 0;

> >>>   }

> >>> @@ -3241,3 +3257,28 @@ int smu_send_hbm_bad_channel_flag(struct

> >>> smu_context *smu, uint32_t size)

> >>>       return ret;

> >>>   }

> >>> +

> >>> +static int smu_set_cpu_smt_enable(struct smu_context *smu, bool

> enable)

> >>> +{

> >>> +    int ret = -EINVAL;

> >>> +

> >>> +    if (smu->ppt_funcs && smu->ppt_funcs->set_cpu_smt_enable)

> >>> +        ret = smu->ppt_funcs->set_cpu_smt_enable(smu, enable);

> >>> +

> >>> +    return ret;

> >>> +}

> >>> +

> >>> +static int smt_notifier_callback(struct notifier_block *nb,

> >>> +                 unsigned long action, void *data)

> >>> +{

> >>> +    struct smu_context *smu = current_smu;

> >>> +    int ret = NOTIFY_OK;

> >>

> >> This initialization is pointless, it's clobbered in the next line.

> >>

> >>> +

> >>> +    ret = (action == SMT_ENABLED) ?

> >>> +                smu_set_cpu_smt_enable(smu, true) :

> >>> +                smu_set_cpu_smt_enable(smu, false);

> >>

> >> How about this instead, it should be more readable:

> >>

> >>      ret = smu_set_cpu_smt_enable(smu, action == SMT_ENABLED);

> >>

> >>> +    if (ret)

> >>> +        ret = NOTIFY_BAD;

> >>> +

> >>> +    return ret;

> >>

> >> How about instead:

> >>

> >>      dev_dbg(adev->dev, "failed to %sable SMT: %d\n", action ="">
> >> SMT_ENABLED ? "en" : "dis", ret);

> >>

> >>      return ret ? NOTIFY_BAD : NOTIFY_OK;

> >>

> >>> +}

> >>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h

> >>> b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h

> >>> index 09469c750a96..7c6594bba796 100644

> >>> --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h

> >>> +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h

> >>> @@ -1354,6 +1354,11 @@ struct pptable_funcs {

> >>>        * @init_pptable_microcode: Prepare the pptable microcode to

> >>> upload via PSP

> >>>        */

> >>>       int (*init_pptable_microcode)(struct smu_context *smu);

> >>> +

> >>> +    /**

> >>> +     * @set_cpu_smt_enable: Set the CPU SMT status

> >>> +     */

> >>> +    int (*set_cpu_smt_enable)(struct smu_context *smu, bool enable);

> >>>   };

> >>>   typedef enum {

> >>

> >