On 2024-12-03 09:30, Yunxiang Li wrote:
When using MES creating a pdd will require talking to the GPU to setup
the relevant context. The code here forgot to wake up the GPU in case it
was in suspend, this causes KVM to EFAULT for passthrough GPU for
example. This issue can be masked if the GPU was woken up by other
things (e.g. opening the KMS node) first and have not yet gone to sleep.
Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3")
Signed-off-by: Yunxiang Li <Yunxiang.Li@xxxxxxx>
---
v3: remove the cleanup in kfd_bind_process_to_device and document why
this issue doesn't always happen
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 555a892fcf963..c81c020af75d1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1635,12 +1635,19 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev,
atomic64_set(&pdd->evict_duration_counter, 0);
if (dev->kfd->shared_resources.enable_mes) {
+ retval = pm_runtime_resume_and_get(bdev);
+ if (retval < 0) {
+ pr_err("failed to stop autosuspend\n");
+ goto err_free_pdd;
+ }
retval = amdgpu_amdkfd_alloc_gtt_mem(adev,
AMDGPU_MES_PROC_CTX_SIZE,
&pdd->proc_ctx_bo,
&pdd->proc_ctx_gpu_addr,
&pdd->proc_ctx_cpu_ptr,
false);
As far as I can see from grepping the code, this BO is never used. It is
allocated here and freed in kfd_process_destroy_pdds, and that's it.
I see a different proc_ctx_bo allocation in amdgpu_mes_create_process
but I don't see that function being called anywhere. Either my grep-Fu
is getting rusty, or there is some dead code and data structures
surrounding MES here.
So unless I'm missing something, we can just remove this proc_ctx_bo
completely.
Regards,
Felix
+ pm_runtime_mark_last_busy(bdev);
+ pm_runtime_put_autosuspend(bdev);
if (retval) {
dev_err(bdev,
"failed to allocate process context bo\n");