Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

"Lazar, Lijo" <lijo.lazar@xxxxxxx> · Fri, 30 Sep 2022 10:16:37 +0530

On 9/29/2022 11:37 PM, Felix Kuehling wrote:
On 2022-09-29 07:10, Lazar, Lijo wrote:


On 9/29/2022 2:18 PM, Sharma, Shashank wrote:


On 9/28/2022 11:51 PM, Alex Deucher wrote:
On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
<shashank.sharma@xxxxxxx> wrote:



On 9/27/2022 10:40 PM, Alex Deucher wrote:
On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
<shashank.sharma@xxxxxxx> wrote:



On 9/27/2022 5:23 PM, Felix Kuehling wrote:
Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:
Hello Felix,

Thank for the review comments.

On 9/27/2022 4:48 PM, Felix Kuehling wrote:
Am 2022-09-27 um 02:12 schrieb Christian König:
Am 26.09.22 um 23:40 schrieb Shashank Sharma:
This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.

Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
Signed-off-by: Shashank Sharma <shashank.sharma@xxxxxxx>

Feel free to add my acked-by, but Felix should probably take 
a look
as well.

This look OK purely from a compute perspective. But I'm concerned
about the interaction of compute with graphics or multiple 
graphics
contexts submitting work concurrently. They would constantly 
override
or disable each other's workload hints.

For example, you have an amdgpu_ctx with
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
process that also wants the compute profile. Those could be 
different
processes belonging to different users. Say, KFD enables the 
compute
profile first. Then the graphics context submits a job. At the 
start
of the job, the compute profile is enabled. That's a no-op 
because
KFD already enabled the compute profile. When the job 
finishes, it
disables the compute profile for everyone, including KFD. That's
unexpected.


In this case, it will not disable the compute profile, as the
reference counter will not be zero. The reset_profile() will 
only act
if the reference counter is 0.

OK, I missed the reference counter.



But I would be happy to get any inputs about a policy which can be
more sustainable and gets better outputs, for example:
- should we not allow a profile change, if a PP mode is already
applied and keep it Early bird basis ?

For example: Policy A
- Job A sets the profile to compute
- Job B tries to set profile to 3D, but we do not allow it as 
job A is
not finished it yet.

Or Policy B: Current one
- Job A sets the profile to compute
- Job B tries to set profile to 3D, and we allow it. Job A also 
runs
in PP 3D
- Job B finishes, but does not reset PP as reference count is 
not zero
due to compute
- Job  A finishes, profile reset to NONE

I think this won't work. As I understand it, the
amdgpu_dpm_switch_power_profile enables and disables individual
profiles. Disabling the 3D profile doesn't disable the compute 
profile
at the same time. I think you'll need one refcount per profile.

Regards,
     Felix

Thanks, This is exactly what I was looking for, I think Alex's 
initial
idea was around it, but I was under the assumption that there is 
only
one HW profile in SMU which keeps on getting overwritten. This 
can solve
our problems, as I can create an array of reference counters, and 
will
disable only the profile whose reference counter goes 0.

It's been a while since I paged any of this code into my head, but I
believe the actual workload message in the SMU is a mask where you 
can
specify multiple workload types at the same time and the SMU will
arbitrate between them internally.  E.g., the most aggressive one 
will
be selected out of the ones specified.  I think in the driver we just
set one bit at a time using the current interface.  It might be 
better
to change the interface and just ref count the hint types and then
when we call the set function look at the ref counts for each hint
type and set the mask as appropriate.

Alex


Hey Alex,
Thanks for your comment, if that is the case, this current patch 
series
works straight forward, and no changes would be required. Please 
let me
know if my understanding is correct:

Assumption: Order of aggression: 3D > Media > Compute

- Job 1: Requests mode compute: PP changed to compute, ref count 1
- Job 2: Requests mode media: PP changed to media, ref count 2
- Job 3: requests mode 3D: PP changed to 3D, ref count 3
- Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref 
> 0,
PP still 3D
- Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref 
> 0,
PP still 3D
- Job 2 finishes, downs ref count to 0, PP changed to NONE,

In this way, every job will be operating in the Power profile of 
desired
aggression or higher, and this API guarantees the execution 
at-least in
the desired power profile.

I'm not entirely sure on the relative levels of aggression, but I
believe the SMU priorities them by index.  E.g.
#define WORKLOAD_PPLIB_DEFAULT_BIT        0
#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
#define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
#define WORKLOAD_PPLIB_VIDEO_BIT          3
#define WORKLOAD_PPLIB_VR_BIT             4
#define WORKLOAD_PPLIB_COMPUTE_BIT        5
#define WORKLOAD_PPLIB_CUSTOM_BIT         6

3D < video < VR < compute < custom

VR and compute are the most aggressive.  Custom takes preference
because it's user customizable.

Alex


Thanks, so this UAPI will guarantee the execution of the job in 
atleast the requested power profile, or a more aggressive one.


Hi Shashank,

This is not how the API works in the driver PM subsystem. In the final 
interface with PMFW, driver sets only one profile bit and doesn't set 
any mask. So it doesn't work the way as Felix explained.

I was not looking at the implementation but at the API:

int amdgpu_dpm_switch_power_profile(struct amdgpu_device *adev,
                                     enum PP_SMC_POWER_PROFILE type,
                                     bool en)

This API suggests, that we can enable and disable individual profiles. 
E.g. disabling PP_SMC_POWER_PROFILE_VIDEO should not change whether 
PP_SMC_POWER_PROFILE_COMPUTE is enabled. What we actually send to the HW 
when multiple profiles are enabled through this API is a different 
question. We have to choose one profile or the other. This can happen in 
the driver or the firmware. I don't care.

But if disabling PP_SMC_POWER_PROFILE_VIDEO makes us forget that we ever 
enabled PP_SMC_POWER_PROFILE_COMPUTE then this API is broken and useless 
as an abstraction.


Checked again. Here driver decides the priority instead of FW. So the 
API works as you mentioned (except that there is no refcount done). 
Sorry for the confusion.

Thanks,
Lijo

Regards,
   Felix


If there is more than one profile bit set, PMFW looks at the mask and 
picks the one with the highest priority. Note that for each update of 
workload mask, PMFW should get a message.

Driver currently sets only bit as Alex explained earlier. For our 
current driver implementation, you can check this as example -

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c#L1753 


Also, PM layer already stores the current workload profile for a *get* 
API (which also means a new pm workload variable is not needed). But, 
that API works as long as driver sets only one profile bit, that way 
driver is sure of the current profile mode -

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c#L1628 


When there is more than one, driver is not sure of the internal 
priority of PMFW though we can follow the bit order which Alex 
suggested (but sometimes FW carry some workarounds inside which means 
it doesn't necessarily follow the same order).

There is an existing interface through sysfs through which allow to 
change the profile mode and add custom settings. In summary, any 
handling of change from single bit to mask needs to be done at the 
lower layer.

The problem is this behavior has been there throughout all legacy 
ASICs. Not sure how much of effort it takes and what all needs to be 
modified.

Thanks,
Lijo

I will do the one change required and send the updated one.

- Shashank





- Shashank



- Shashank





Or anything else ?

REgards
Shashank


Or you have multiple VCN contexts. When context1 finishes a 
job, it
disables the VIDEO profile. But context2 still has a job on 
the other
VCN engine and wants the VIDEO profile to still be enabled.

Regards,
     Felix



Christian.

---
    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 
+++++++++++---
    1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 5e53a5293935..1caed319a448 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -34,6 +34,7 @@
    #include "amdgpu_ras.h"
    #include "amdgpu_umc.h"
    #include "amdgpu_reset.h"
+#include "amdgpu_ctx_workload.h"
      /* Total memory size in system memory and all GPU 
VRAM. Used to
     * estimate worst case amount of memory to reserve for 
page tables
@@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct
amdgpu_device *adev,
      void amdgpu_amdkfd_set_compute_idle(struct 
amdgpu_device *adev,
bool idle)
    {
-    amdgpu_dpm_switch_power_profile(adev,
- PP_SMC_POWER_PROFILE_COMPUTE,
-                    !idle);
+    int ret;
+
+    if (idle)
+        ret = amdgpu_clear_workload_profile(adev,
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
+    else
+        ret = amdgpu_set_workload_profile(adev,
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
+
+    if (ret)
+        drm_warn(&adev->ddev, "Failed to %s power profile to
compute mode\n",
+             idle ? "reset" : "set");
    }
      bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device 
*adev, u32
vmid)