[AMD Official Use Only - AMD Internal Distribution Only] > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Yunxiang > Li > Sent: Thursday, October 10, 2024 12:18 PM > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian > <Christian.Koenig@xxxxxxx>; Gui, Jack <Jack.Gui@xxxxxxx>; Li, Yunxiang > (Teddy) <Yunxiang.Li@xxxxxxx> > Subject: [PATCH v2 2/2] drm/amdkfd: pause autosuspend when creating pdd > > When using MES creating a pdd will require talking to the GPU to setup the relevant > context. The code here forgot to wake up the GPU in case it was in suspend, this > causes KVM to EFAULT for passthrough GPU for example. > > Also, change the other place where we pause suspend to use the cleaner > pm_runtime_resume_and_get helper. > > Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3") > Signed-off-by: Yunxiang Li <Yunxiang.Li@xxxxxxx> > --- > > It is unclear to me if kfd_process_destroy_pdds also have this problem, or is freeing > gtt mem guaranteed to succeed even with the GPU in suspend. > > drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c > b/drivers/gpu/drm/amd/amdkfd/kfd_process.c > index c9ce95bc3ec8e..c3d510f1dabfa 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c > @@ -1633,12 +1633,19 @@ struct kfd_process_device > *kfd_create_process_device_data(struct kfd_node *dev, > atomic64_set(&pdd->evict_duration_counter, 0); > > if (dev->kfd->shared_resources.enable_mes) { > + retval = pm_runtime_resume_and_get(bdev); > + if (retval < 0) { > + pr_err("failed to stop autosuspend\n"); > + goto err_free_pdd; > + } I am not 100% sure if it makes sense to resume here and then put it back on suspend queue after the allocation. And then again call pm_runtime_resume in kfd_bind_process_to_device(). Maybe you can set pdd->runtime_inuse = true here itself and not call the pm_runtime_put_autosuspend here. I will probably let @Kuehling, Felix comment on this. Regards, Mukul > retval = amdgpu_amdkfd_alloc_gtt_mem(adev, > AMDGPU_MES_PROC_CTX_SIZE, > &pdd->proc_ctx_bo, > &pdd->proc_ctx_gpu_addr, > &pdd->proc_ctx_cpu_ptr, > false); > + pm_runtime_mark_last_busy(bdev); > + pm_runtime_put_autosuspend(bdev); > if (retval) { > dev_err(bdev, > "failed to allocate process context bo\n"); @@ - > 1768,11 +1775,9 @@ struct kfd_process_device > *kfd_bind_process_to_device(struct kfd_node *dev, > * pdd is destroyed. > */ > if (!pdd->runtime_inuse) { > - err = pm_runtime_get_sync(bdev); > - if (err < 0) { > - pm_runtime_put_autosuspend(adev_to_drm(dev->adev)- > >dev); > + err = pm_runtime_resume_and_get(bdev); > + if (err < 0) > return ERR_PTR(err); > - } > } > > /* > -- > 2.34.1