RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized

"Kim, Jonathan" <Jonathan.Kim@xxxxxxx> · Wed, 6 Nov 2019 03:49:06 +0000

No problem.  With the incoming patch referenced below, this patch should be ok since it doesn’t break the build.
 
Reviewed-by:  Jonathan Kim <Jonathan.Kim@xxxxxxx>
 


From: Quan, Evan <Evan.Quan@xxxxxxx> 

Sent: Tuesday, November 5, 2019 10:11 PM

To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; Strawbridge, Michael <Michael.Strawbridge@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx

Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized


 
Thanks Jon.
Per discussed in another mail thread, this should be applied after the patch below
https://lists.freedesktop.org/archives/amd-gfx/2019-November/042160.html
Sorry for missing this important information.
 
Regards,
Evan



From: Kim, Jonathan <Jonathan.Kim@xxxxxxx>


Sent: Wednesday, November 6, 2019 10:18 AM

To: Strawbridge, Michael <Michael.Strawbridge@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx

Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized


 
Taking a look at this again, it’s not an indexing issue, it’s a placement problem.  I don’t think this solution will work if we expect to pstate switch all gpus.
 
From amdgpu_kms.c,  amdgpu_register_gpu_instance comes after amdgpu_device_init.  This means we’ll never reach mgpu_info.num_gpu == adev->gmc.xgmi.num_physical_nodes until in this code path.
 


From: Kim, Jonathan 

Sent: Tuesday, November 5, 2019 12:07 PM

To: Strawbridge, Michael <Michael.Strawbridge@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx

Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized


 
+                       for (i = 0; i < mgpu_info.num_gpu; i++) {
Also while num_physical_nodes should be used here instead.  It doesn’t make sense to have a pre-condition check i.e. (if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes
 - 1) ) against the total number of nodes while the loop checks for the current num gpus from mgpu_info.  This will fail to set the last node to low pstate for the same 0-indexing issue once the pre-condition passes.



+                               gpu_instance = &(mgpu_info.gpu_ins[i]);

+                               if (gpu_instance->adev->flags & AMD_IS_APU)

+                                       continue;

+
 


From: Kim, Jonathan 

Sent: Tuesday, November 5, 2019 11:42 AM

To: Strawbridge, Michael <Michael.Strawbridge@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx

Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized


 
Yes that’s correct.  This should fix the issue.
 
Jon
 


From: Strawbridge, Michael <Michael.Strawbridge@xxxxxxx>


Sent: Tuesday, November 5, 2019 11:40 AM

To: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx

Subject: Re: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized


 

Hi Jon,


 


Does that mean we can simply use this instead?


 


+               if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes - 1) {


 


Thanks,


Michael





From: Kim, Jonathan <Jonathan.Kim@xxxxxxx>

Sent: 05 November 2019 11:32 AM

To: Quan, Evan <Evan.Quan@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>

Cc: Strawbridge, Michael <Michael.Strawbridge@xxxxxxx>

Subject: RE: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized


 




Please see inline.



Jon



-----Original Message-----

From: Quan, Evan <Evan.Quan@xxxxxxx> 

Sent: Tuesday, November 5, 2019 5:24 AM

To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx

Cc: Kim, Jonathan <Jonathan.Kim@xxxxxxx>; Strawbridge, Michael <Michael.Strawbridge@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>

Subject: [PATCH 2/2] drm/amdgpu: perform p-state switch after the whole hive initialized



P-state switch should be performed after all devices from the hive get initialized.



Change-Id: Ifc7cac9ef0cf250447d2a412da35d601e2ac79ec

Signed-off-by: Evan Quan <evan.quan@xxxxxxx>

---

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 47 ++++++++++++++++------

 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index e6ce949670e5..2d72d206cead 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

@@ -2090,6 +2090,7 @@ static int amdgpu_device_enable_mgpu_fan_boost(void)

  */

 static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)  {

+       struct amdgpu_gpu_instance *gpu_instance;

         int i = 0, r;

 

         for (i = 0; i < adev->num_ip_blocks; i++) { @@ -2115,6 +2116,40 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)

         if (r)

                 DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);

 

+

+       if (adev->gmc.xgmi.num_physical_nodes > 1) {

+               mutex_lock(&mgpu_info.mutex);

+

+               /*

+                * Reset device p-state to low as this was booted with high.

+                *

+                * This should be performed only after all devices from the same

+                * hive get initialized.

+                *

+                * However, it's unknown how many device in the hive in advance.

+                * As this is counted one by one during devices initializations.

+                *

+                * So, we wait for all XGMI interlinked devices initialized.

+                * This may bring some delays as those devices may come from

+                * different hives. But that should be OK.

+                */

+               if (mgpu_info.num_dgpu == adev->gmc.xgmi.num_physical_nodes) {

[JK] This condition will never succeed.  mgpu_info.num_dgpu is 0-indexed while num_physical_nodes give total nodes.



+                       for (i = 0; i < mgpu_info.num_gpu; i++) {

+                               gpu_instance = &(mgpu_info.gpu_ins[i]);

+                               if (gpu_instance->adev->flags & AMD_IS_APU)

+                                       continue;

+

+                               r = amdgpu_xgmi_set_pstate(gpu_instance->adev, 0);

+                               if (r) {

+                                       DRM_ERROR("pstate setting failed (%d).\n", r);

+                                       break;

+                               }

+                       }

+               }

+

+               mutex_unlock(&mgpu_info.mutex);

+       }

+

         return 0;

 }

 

@@ -2227,18 +2262,6 @@ static void amdgpu_device_delayed_init_work_handler(struct work_struct *work)

         r = amdgpu_ib_ring_tests(adev);

         if (r)

                 DRM_ERROR("ib ring test failed (%d).\n", r);

-

-       /*

-        * set to low pstate by default

-        * This should be performed after all devices from

-        * XGMI finish their initializations. Thus it's moved

-        * to here.

-        * The time delay is 2S. TODO: confirm whether that

-        * is enough for all possible XGMI setups.

-        */

-       r = amdgpu_xgmi_set_pstate(adev, 0);

-       if (r)

-               DRM_ERROR("pstate setting failed (%d).\n", r);

 }

 

 static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)

--

2.23.0






_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx