Hello Felix,
Thank for the review comments.
On 9/27/2022 4:48 PM, Felix Kuehling wrote:
Am 2022-09-27 um 02:12 schrieb Christian König:
Am 26.09.22 um 23:40 schrieb Shashank Sharma:
This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.
Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
Signed-off-by: Shashank Sharma <shashank.sharma@xxxxxxx>
Feel free to add my acked-by, but Felix should probably take a look as
well.
This look OK purely from a compute perspective. But I'm concerned about
the interaction of compute with graphics or multiple graphics contexts
submitting work concurrently. They would constantly override or disable
each other's workload hints.
For example, you have an amdgpu_ctx with
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
process that also wants the compute profile. Those could be different
processes belonging to different users. Say, KFD enables the compute
profile first. Then the graphics context submits a job. At the start of
the job, the compute profile is enabled. That's a no-op because KFD
already enabled the compute profile. When the job finishes, it disables
the compute profile for everyone, including KFD. That's unexpected.
In this case, it will not disable the compute profile, as the reference
counter will not be zero. The reset_profile() will only act if the
reference counter is 0.
But I would be happy to get any inputs about a policy which can be more
sustainable and gets better outputs, for example:
- should we not allow a profile change, if a PP mode is already applied
and keep it Early bird basis ?
For example: Policy A
- Job A sets the profile to compute
- Job B tries to set profile to 3D, but we do not allow it as job A is
not finished it yet.
Or Policy B: Current one
- Job A sets the profile to compute
- Job B tries to set profile to 3D, and we allow it. Job A also runs in
PP 3D
- Job B finishes, but does not reset PP as reference count is not zero
due to compute
- Job A finishes, profile reset to NONE
Or anything else ?
REgards
Shashank
Or you have multiple VCN contexts. When context1 finishes a job, it
disables the VIDEO profile. But context2 still has a job on the other
VCN engine and wants the VIDEO profile to still be enabled.
Regards,
Felix
Christian.
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 5e53a5293935..1caed319a448 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -34,6 +34,7 @@
#include "amdgpu_ras.h"
#include "amdgpu_umc.h"
#include "amdgpu_reset.h"
+#include "amdgpu_ctx_workload.h"
/* Total memory size in system memory and all GPU VRAM. Used to
* estimate worst case amount of memory to reserve for page tables
@@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device
*adev,
void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev,
bool idle)
{
- amdgpu_dpm_switch_power_profile(adev,
- PP_SMC_POWER_PROFILE_COMPUTE,
- !idle);
+ int ret;
+
+ if (idle)
+ ret = amdgpu_clear_workload_profile(adev,
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
+ else
+ ret = amdgpu_set_workload_profile(adev,
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
+
+ if (ret)
+ drm_warn(&adev->ddev, "Failed to %s power profile to compute
mode\n",
+ idle ? "reset" : "set");
}
bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)